Author, Researcher, Speaker, Director, DevOps Enthusiast at "The Unicorn Project: A Novel About Developers, Digital Disruption, and Thriving in the Age of Data"
Author, Researcher, Speaker, Director, DevOps Enthusiast at "The Unicorn Project: A Novel About Developers, Digital Disruption, and Thriving in the Age of Data"
18. The Downward Spiral
Operations Sees…
Fragile applications are prone to
failure
Long time required to figure out “which
bit got flipped”
Detective control is a salesperson
Too much time required to restore
service
Too much firefighting and unplanned
work
Planned project work cannot complete
Frustrated customers leave
Market share goes down
Business misses Wall Street
commitments
Business makes even larger promises
to Wall Street
Dev Sees…
More urgent, date-driven projects
put into the queue
Even more fragile code put into
production
More releases have increasingly
“turbulent installs”
Release cycles lengthen to
amortize “cost of deployments”
Failing bigger deployments more
difficult to diagnose
Most senior and constrained IT
ops resources have less time to
fix underlying process problems
Ever increasing backlog of
infrastructure projects that could
fix root cause and reduce costs
Ever increasing amount of
tension between IT Ops and
Development
These aren’t IT Operations problems…
These are business problems!
21. @RealGeneKim
Easy Lines To Get Started
Problem
“What’s the difference between a good day and a bad
day?”
“What keeps you up at night?”
36
22. @RealGeneKim
Easy Lines To Get Started
Significance
“Does anyone really care if that bad thing happens?”
“On a scale of 1-10, how big of a problem is this?”
“So what?”
37
23. @RealGeneKim
Easy Lines To Get Started
Solution
“If you could wave a magic wand, what would you
do?”
“If you were king/queen, what would it look like?”
38
24. @RealGeneKim
Easy Lines To Get Started
Value
“What’s in it for you?”
“In six months, if all this comes true, what does life
look like for you?”
39
25. @RealGeneKim
Stages Of Value Selling
Problem
Significance
Solution
Value
40
When you do this, it should give you confidence that you’re not wasting
anyone’s time.
26. The Downward Spiral
Operations Sees…
Fragile applications are prone to
failure
Long time required to figure out “which
bit got flipped”
Detective control is a salesperson
Too much time required to restore
service
Too much firefighting and unplanned
work
Planned project work cannot complete
Frustrated customers leave
Market share goes down
Business misses Wall Street
commitments
Business makes even larger promises
to Wall Street
Dev Sees…
More urgent, date-driven projects
put into the queue
Even more fragile code put into
production
More releases have increasingly
“turbulent installs”
Release cycles lengthen to
amortize “cost of deployments”
Failing bigger deployments more
difficult to diagnose
Most senior and constrained IT
ops resources have less time to
fix underlying process problems
Ever increasing backlog of
infrastructure projects that could
fix root cause and reduce costs
Ever increasing amount of
tension between IT Ops and
Development
These aren’t IT Operations problems…
These are business problems!
29. The Downward Spiral
Operations Sees…
Fragile applications are prone to
failure
Long time required to figure out “which
bit got flipped”
Detective control is a salesperson
Too much time required to restore
service
Too much firefighting and unplanned
work
Planned project work cannot complete
Frustrated customers leave
Market share goes down
Business misses Wall Street
commitments
Business makes even larger promises
to Wall Street
Dev Sees…
More urgent, date-driven projects
put into the queue
Even more fragile code put into
production
More releases have increasingly
“turbulent installs”
Release cycles lengthen to
amortize “cost of deployments”
Failing bigger deployments more
difficult to diagnose
Most senior and constrained IT
ops resources have less time to
fix underlying process problems
Ever increasing backlog of
infrastructure projects that could
fix root cause and reduce costs
Ever increasing amount of
tension between IT Ops and
Development
These aren’t IT Operations problems…
These are business problems!
31. The Downward Spiral
Operations Sees…
Fragile applications are prone to
failure
Long time required to figure out “which
bit got flipped”
Detective control is a salesperson
Too much time required to restore
service
Too much firefighting and unplanned
work
Planned project work cannot complete
Frustrated customers leave
Market share goes down
Business misses Wall Street
commitments
Business makes even larger promises
to Wall Street
Dev Sees…
More urgent, date-driven projects
put into the queue
Even more fragile code put into
production
More releases have increasingly
“turbulent installs”
Release cycles lengthen to
amortize “cost of deployments”
Failing bigger deployments more
difficult to diagnose
Most senior and constrained IT
ops resources have less time to
fix underlying process problems
Ever increasing backlog of
infrastructure projects that could
fix root cause and reduce costs
Ever increasing amount of
tension between IT Ops and
Development
These aren’t IT Operations problems…
These are business problems!
35. @RealGeneKim
Value To Infosec
Integrate security testing into daily Dev work
Reduce time from “find to fix”
Reduce surface area of risk
Non-functional requirements (Anonymous can
do 6 GB/sec DDoS: how can we survive it?)
Enforce consistency
Build in auditability
Have reliance on IT Ops tools in daily use
Traceability of production artifacts
50
36. The Downward Spiral
Operations Sees…
Fragile applications are prone to
failure
Long time required to figure out “which
bit got flipped”
Detective control is a salesperson
Too much time required to restore
service
Too much firefighting and unplanned
work
Planned project work cannot complete
Frustrated customers leave
Market share goes down
Business misses Wall Street
commitments
Business makes even larger promises
to Wall Street
Dev Sees…
More urgent, date-driven projects
put into the queue
Even more fragile code put into
production
More releases have increasingly
“turbulent installs”
Release cycles lengthen to
amortize “cost of deployments”
Failing bigger deployments more
difficult to diagnose
Most senior and constrained IT
ops resources have less time to
fix underlying process problems
Ever increasing backlog of
infrastructure projects that could
fix root cause and reduce costs
Ever increasing amount of
tension between IT Ops and
Development
These aren’t IT Operations problems…
These are business problems!
41. @RealGeneKim
Our Desired Future Reality
Installs are predictable and require less time/effort than ever
Engineering teams take decisive steps to correct bad installs (and they don’t
happen again)
We are deploying code faster than ever, and can quickly detect and recover
We have operational discipline to enforce a structured resolution process
Less unexpected downtime
Schedule and complete infrastructure improvement projects
Bad installs rarely have a cascading effect
Business unit releases are on schedule (vs delayed)
Customers rarely leave
We’re winning customers
We exceed our 20% growth target
Our business hitting earnings targets
We can tackle even more projects, hire more stars, etc.
56
42. @RealGeneKim
High Performing DevOps Teams
They’re more agile
30x more frequent deployments
8,000x faster cycle time than their peers
They’re more reliable
2x the change success rate
12x faster MTTR
Source: Puppet Labs 2012 State Of DevOps: http://puppetlabs.com/2013-state-of-devops-infographic
43. 58
How organizations achieve high performance
• 89% have infrastructure artifacts in version control
• 82% have automated process to create environments
Source: Puppet Las 2012 DevOps Survey Of Practice
44. 59
Performance by DevOps maturity
Organizations that implemented DevOps practices over 12
months ago were 5x more likely to be high performing than
organizations that weren’t implementing DevOps at all.
Source: Puppet Las 2012 DevOps Survey Of Practice
45. @RealGeneKim
Who Is Doing DevOps?
Google, Amazon, Netflix, Etsy, Twitter, Facebook
, Pinterest …
BNY Mellon, Bank of America, World
Bank, Paychex, Intuit…
The
Gap, Nordstrom, REI, Macy’s, GameStop, …
Portland State University, Seton Hill
University, Kansas State University…
Who else?
60
46. @RealGeneKim
Who Is Doing DevOps?
Google, Amazon, Netflix, Etsy, Twitter, Facebook
, Pinterest …
BNY Mellon, Bank of America, World
Bank, Paychex, Intuit…
The
Gap, Nordstrom, REI, Macy’s, GameStop, …
Portland State University, Seton Hill
University, Kansas State University…
Who else?
61
50. @RealGeneKim
65
“This book will have a profound
effect on IT, just as The Goal did
for manufacturing.” --Jez
Humble, co-author Continuous
Delivery
“This is the IT swamp draining
manual for anyone who is neck
deep in alligators.” --Adrian
Cockroft, Cloud Architect at
Netflix
“This is The Goal for our
decade, and is for any IT
professional who wants their life
back.” --Charles Betz, IT
architect, author “Architecture
and Patterns for IT”
55. @RealGeneKim
If I Could Wave A Magic Wand, Everyone Will…
Be energized about how practitioners can
contribute in this organizational journey
Leave with some concrete steps to get some
great outcomes
Help create the coalition that starts putting
DevOps practices into place
117
56. @RealGeneKim
119
“Some books you give to
friends, for the joy of sharing a
great novel.
“Some books you recommend to
your colleagues and
employees, to create common
ground.
“Some books you share with your
boss, to plant the seeds of a big
idea.
“The Phoenix Project is all three.”
--Jeremiah Shirk, Integration &
Infrastructure Manager at
Kansas State University
57. @RealGeneKim
Our Mission: Positively Impact The Lives Of
One Million IT Workers By 2017
Free 170 page
excerpt:http://itrevolution.com/the-
phoenix-project-excerpt/
http://slideshare.net/realgenekim
DevOps Defensive Audit Toolkit
Enterprise DevOps Case Studies
Early draft of upcoming “DevOps
Cookbook”
(Allspaw, DeBois, Edwards, Humble, Kim
, Orzen)
Email me at genek@realgenekim.me
In fact, I think this destructive pattern is the root cause of one of the biggest problems we face, both as a profession, but has the potential to generate more economic value than anything we’ve seen in 30 years. I’m going to share with you what this destructive pattern is, and maybe it’ll sound familiar to you... It’s a story that can be told in four Acts
Who are they auditing? IT operations.I love IT operatoins. Why? Because when the developers screw up, the only people who can save the day are the IT operations people. Memory leak? No problem, we’ll do hourly reboots until you figure that out.Who here is from IT operations?Bad day:Not as prepared for the audit as they thoughtSpending 30% of their time scrambling, generating presentation for auditorsOr an outage, and the developer is adamant that they didn’t make the change – they’re saying, “it must be the security guys – they’re always causing outages”Or, there’s 50 systems behind the load balancer, and six systems are acting funny – what different, and who made them differentOr every server is like a snowflake, each having their own personalityWe as Tripwire practitioners can help them make sure changes are made visible, authorized, deployed completely and accurately, find differencesCreate and enforce a culture of change management and causality
Source: Flickr: birdsandanchors
Who’s introducing variance? Well, it’s often these guys. Show me a developer who isn’t causing an outage, I’ll show you one who is on vacation.Primary measurement is deploy features quickly – get to market.I’ve worked with two of the five largest Internet companies (Google, Microsoft, Yahoo, AOL, Amazon), and I now believe that the biggest differentiator to great time to market is great operations:Bad day: We do 6 weeks of testing, but deployment still fails. Why? QA environment doesn’t match productionOr there’s a failure in testing, and no one can agree whether it’s a code failure or an environment failureOr changes are made in QA, but no one wrote them down, so they didn’t get replicated downstream in productionBelieve it or not, we as Tripwire practitioners can even help them – make sure environments are available when we need them, that they’re properly configured correctly the first time, document all the changes, replicate them downstream
[ picture of messy data center ] Ten minutes into Bill’s first day on the job, he has to deal with a payroll run failure. Tomorrow is payday, and finance just found out that while all the salaried employees are going to get paid, none of the hourly factory employees will. All their records from the factory timekeeping systems were zeroed out.Was it a SAN failure? A database failure? An application failure? Interface failure? Cabling error?
So who are all these constituencies that we can help, and increase our relevance as Tripwire practitioners and champions?How many people here are in infosec?Goal: protect critical systems and dataSafeguard organizational commitmentsPrevent security breaches, help quickly detect and recover from themBad day: no security standardsNo one is complyingYes, we’re 3 years behind. “Whaddyagonna do about it?”Vs. we (Tripwire owner) can become more relevant and add value by help infosec by leveraging all the configuration guidance out thereMeasure variance between produciton and those known good statesTrust and verify that when management says, we’ve trued up the configurations, they’ve actually done itWhy? Now, more than ever, there are an ever increasing amount of regulatory and contractual requirements to protect systems and data
There are many ways to react to this: like, fear, horror, trying to become invisible… All understandable, given the circumstances…Because infosec can no longer take 4 weeks to turn around a security review for application code, or take 6 weeks to turnaround a firewall change. But, on the other hand, I think it’s will be the best thing to ever happen to infosec in the past 20 years. We’re calling this Rugged DevOps, because it’s a way for infosec to integrate into the DevOps process, and be welcomed. And not be viewed as the shrill hysterical folks who slow the business down.
Tell story of Amazon, Netflix: they care about, availability, securityIt’s not a push, it’s a pull – they’re looking for our help (#1 concern: fear of disintermediation and being marginalized)
There are many ways to react to this: like, fear, horror, trying to become invisible… All understandable, given the circumstances…Because infosec can no longer take 4 weeks to turn around a security review for application code, or take 6 weeks to turnaround a firewall change. But, on the other hand, I think it’s will be the best thing to ever happen to infosec in the past 20 years. We’re calling this Rugged DevOps, because it’s a way for infosec to integrate into the DevOps process, and be welcomed. And not be viewed as the shrill hysterical folks who slow the business down.
How each side Actively impedes the achievement of each other’s goals.
Two things:Arguing with him was like arguing with columbo: he was always five steps ahead of you, and he was so disarming“Before I can trust you, I first need to know you care”“Genuine Curiousity”: He integrated patterns so much into the fiber of his being
How each side Actively impedes the achievement of each other’s goals.
That’sJez Humble of “Continuous Delivery” fame (@jezhumble) in the picture and me sitting together at PuppetConf 2012.
How each side Actively impedes the achievement of each other’s goals.
How each side Actively impedes the achievement of each other’s goals.
How each side Actively impedes the achievement of each other’s goals.
All of that code has moved to our B2 repoExplosive growth in B2 development internally.5x growthWe aren't just saying we are using B2s - we're doing it every single day.
The Goal introduces the Theory of Constraints, has been the most influential book in Gene’s career. Book taught in most MBA programs. What I love about the Goal, is that it’s a Novel. It’s the story about Alex, who is a Plant Manager. He has to fix the Cost and Due date issues in 90 days, otherwise the plant will be shut down. You live the word through Alex’s eyes, where he discovers almost everything he believes about plant management is wrong and pre-destined for failure. You meet his wife and children, and his great epiphany is actually out on a Gene asked me to read an early copy of When IT Fails, a Business Novel: and just like the Goal, it’s a novel and the first 170 pages describe the problems the company and IT team is facing, from multiple perspectives. It’s an engaging story earns us the right to describe what the solution should look like. Gene, why did you and the co-authors model the book so closely on The Goal?The solution to any complex business problem requires different stakeholders, and in order to do that you first need empathy of what the problem looks like from Operation, Development, Security, Service Management and the Business.
The Goal introduces the Theory of Constraints, has been the most influential book in Gene’s career. Book taught in most MBA programs. What I love about the Goal, is that it’s a Novel. It’s the story about Alex, who is a Plant Manager. He has to fix the Cost and Due date issues in 90 days, otherwise the plant will be shut down. You live the word through Alex’s eyes, where he discovers almost everything he believes about plant management is wrong and pre-destined for failure. You meet his wife and children, and his great epiphany is actually out on a Gene asked me to read an early copy of When IT Fails, a Business Novel: and just like the Goal, it’s a novel and the first 170 pages describe the problems the company and IT team is facing, from multiple perspectives. It’s an engaging story earns us the right to describe what the solution should look like. Gene, why did you and the co-authors model the book so closely on The Goal?The solution to any complex business problem requires different stakeholders, and in order to do that you first need empathy of what the problem looks like from Operation, Development, Security, Service Management and the Business.
EranFeigenbaumDirector of Security, Google Enterprise
[ picture of messy data center ] Ten minutes into Bill’s first day on the job, he has to deal with a payroll run failure. Tomorrow is payday, and finance just found out that while all the salaried employees are going to get paid, none of the hourly factory employees will. All their records from the factory timekeeping systems were zeroed out.Was it a SAN failure? A database failure? An application failure? Interface failure? Cabling error?
All of that code has moved to our B2 repoExplosive growth in B2 development internally.5x growthWe aren't just saying we are using B2s - we're doing it every single day.