@RealGeneKim
Session ID:
Gene Kim
Docker And DevOps:
My Fifteen Year Journey Studying
High Performing IT Organizations
@RealGeneKim
The Downward
Spiral…
@RealGeneKim
@RealGeneKim
@RealGeneKim
IT Ops And Dev At War
5
@RealGeneKim
@RealGeneKim
There Is A Better Way…
@RealGeneKim
Google, Amazon, Netflix,
Spotify, Etsy, Spotify, Twitter,
Facebook…
@RealGeneKimSource: John Allspaw
10 deploys per day
Dev & ops cooperation at Flickr
John Allspaw & Paul Hammond
Velocity 2009
@RealGeneKim
@RealGeneKimSource: John Jenkins, Amazon.com
@RealGeneKim
Making Changes When It Matters Most
“By installing a rampant innovation culture,
we performed 165 experiments in the peak three
months of tax season.”
–Scott Cook, Intuit Founder
“Our business result? Conversion rate of the
website is up 50 percent. Employee result?
Everyone loves it, because now their ideas can
make it to market.”
@RealGeneKim
Who Is Doing DevOps?
 Google, Amazon, Netflix, Etsy, Spotify, Twitter, Facebook …
 CSC, IBM, CA, SAP, HP, Microsoft, Red Hat …
 GE Capital, Nationwide, BNP Paribas, BNY Mellon,
World Bank, Paychex, Intuit …
 The Gap, Nordstrom, Macy’s, Williams-Sonoma, Target …
 General Motors, Northrup Grumman, LEGO, Bosche …
 UK Government, US Department of Homeland Security …
 Kansas State University…
Who else?
@RealGeneKim
High Performing DevOps Teams
 They’re more agile
 30x more frequent deployments
 8,000x faster lead time than their peers
 They’re more reliable
 2x the change success rate
 12x faster MTTR
Source: Puppet Labs 2012 State Of DevOps: http://puppetlabs.com/2013-state-of-devops-infographic
@RealGeneKim
Organizations with high performing DevOps
organizations were 2.5x more likely to
exceed profitability, market share and
productivity goals…
…and had 50% higher market capitalization
growth over 3 years…
Source: Puppet Labs 2014 State Of DevOps
@RealGeneKim
23
@RealGeneKim
HP LaserJet Firmware
@RealGeneKim
25
HP LaserJet Firmware CI/CD Pipeline
@RealGeneKim
“This book will have a profound effect on IT,
just as The Goal did for manufacturing.”
–Jez Humble,
co-author Continuous Delivery
“This is the IT swamp draining manual for
anyone who is neck deep in alligators.”
–Adrian Cockroft,
Cloud Architect at Netflix
“This is The Goal for our decade,
and is for any IT professional who wants
their life back.”
–Charles Betz, IT architect, author
“Architecture and Patterns for IT”
@RealGeneKim
The First Way: Flow
@RealGeneKim
“deploys per day”
vs.
“lead time”
@RealGeneKim
“What is your lead time
for changes?”
“How long does it take to go from
code committed to code successfully
running in production?”
IT’S A TRAP
@RealGeneKim
@RealGeneKim
Create One Step Environment
Creation Process
 Make environments available early in the
Development process
 Make sure Dev builds the code and environment
at the same time
 Create a common Dev, QA and Production
environment creation process
@RealGeneKim
If I had a magic wand,
I’d change the Agile sprints and
definition of “done”:
“At the end of each sprint, we must
have working and shippable code…
demonstrated in an environment
that resembles production.”
@RealGeneKim
Deploy Smaller Changes, More Frequently *
Source: http://www.facebook.com/note.php?note_id=14218138919
@RealGeneKim
Deploy Smaller Changes, More Frequently *
 Decouple feature releases from code
deployments
 Deploy features in a disabled state, using feature
flags
 Require all developers check code into trunk
daily (at least)
 Practice deploying smaller changes, which
dramatically reduces risk and improves MTTR
@RealGeneKim
Experiment: Reducing Batch Size By 50%
Source: Scott Prugh, Chief Architect, CSG, Inc.
And the customer got the feature in
half the time!
@RealGeneKim
Breaking The Bottlenecks In The Flow
 Environment creation
 Code deployment
 Test setup and run (mention @rohansingh)
 Overly tight architecture
 Development
 Product management
@RealGeneKim
“In November 2011, running even the most minimal
test for CloudFoundry required deploying to 45 virtual
machines, which took a half hour. This was way too
long, and also prevented developers from testing on
their own workstations.
By using containers, within months, we got it down to
18 virtual machines so that any developer can deploy
the entire system to single VM in six minutes.”
— Elisabeth Hendrickson, Director of Quality
Engineering, Pivotal Labs
@RealGeneKim
Blackboard Learn: 2005-Present
39
@RealGeneKim
Blackboard Learn Building Blocks
40
@RealGeneKim
Top Predictors Of IT Performance
 Version control of all production artifacts
 Continuous integration and deployment
 Automated acceptance testing
 Peer-review of production changes (vs. external
change approval)
 High trust culture
 Proactive monitoring of the production environment
 Win-win relationship between Dev and Ops
@RealGeneKim
The First Way: Outcomes
 Creating single repository for code and environments
 Determinism in the release process
 Consistent Dev, Test and Production environments, all properly
built before deployment begins
 Features being deployed daily without catastrophic failures
 Decreased lead time
 Faster cycle time and release cadence
@RealGeneKim
The Second Way: Feedback
@RealGeneKim
@RealGeneKim
How many times per day is the andon cord
pulled in a typical day at a Toyota
manufacturing plant?
3500 times per day
@RealGeneKim
Why would Toyota do something so disruptive as
stopping production thousands of times per day?
“It’s the only way we can build 2,000 vehicles
per day – that’s one completed vehicle every
55 seconds.”
@RealGeneKim
"Automated tests transform fear into boredom."
-- Eran Messeri, Google
Google Dev And Ops (2013)
 15,000 engineers, working on 4,000+ projects
 All code is checked into one source tree
(billions of files!)
 5,500 code commits/day
 75 million test cases are run daily
@RealGeneKim
Developers Carry Pagers
“We found that when we woke up developers at
2am, defects got fixed faster than ever”
– Patrick Lightbody,
CEO, BrowserMob
“You build it, you run it.”
– Werner Vogels
CTO, Amazon
@RealGeneKim
Developers Carry Pagers
“As a developer, there has never been a more
satisfying point in my career than when I wrote
the code, I pushed the button to deploy it,
I watched the metrics to see if it actually worked
in production, and fixed it if it broke.”
– Tim Tischler
Director of Operations Engr,
Nike, Inc.
@RealGeneKim
Pervasive Production Telemetry
“Having a
developer add a
monitoring metric
shouldn’t feel like
a schema
change.”
– John Allspaw,
SVP Tech Ops,
Etsy
@RealGeneKim
@RealGeneKim
One Of The Highest Predictors Of
Performance
@RealGeneKim
One Of The Highest Predictors Of
Performance
@RealGeneKim
Top Predictors Of IT Performance
 Version control of all production artifacts
 Continuous integration and deployment
 Automated acceptance testing
 Peer-review of production changes (vs. external
change approval)
 High trust culture
 Proactive monitoring of the production environment
 Win-win relationship between Dev and Ops
@RealGeneKim
The Second Way: Outcomes
 Defects and security issues getting fixed faster than ever
 Disciplined automated testing enabling many
simultaneous small, agile teams to work productively
 All groups communicating and coordinating better
 Everybody is getting more work done
@RealGeneKim
The Third Way:
Continual Experimentation And Learning
@RealGeneKim
You Don’t Choose Chaos Monkey…
Chaos Monkey Chooses You
@RealGeneKim
Allocate 20% Of Cycles To Technical
Debt Reduction
@RealGeneKim
“By November 2011, Kevin Scott,
LinkedIn’s top engineer, had had
enough. The system was taxed as
LinkedIn attracted more users, and
engineers were burnt out.
“To fix the problems, Scott, who’d
arrived from Google that February,
launched Operation InVersion.
“He froze development on new
features so engineers could overhaul
the computing architecture.
“`We had to tell management we’re
not going to deliver anything new
while all of engineering works on this
project for the next two months,’
Scott says. “It was a scary thing.’”
@RealGeneKim
@RealGeneKim
Source: Pingdom
@RealGeneKim
Why Do I Think This Is
Important?
@RealGeneKim
The Downward
Spiral…
@RealGeneKim
@RealGeneKim
Opportunity Cost Of
Wasted IT Spending?
$2,600,000,000,000.00 per year
($2.6 Trillion US)
@RealGeneKim
DevOps Enterprise Summit
 Save the date: October 21-23, 2014
 DevOps Enterprise is a conference for horses, by horses
 Macy’s, Disney, GE Capital, Blackboard, Telstra, US Department of
Homeland Security, CSG
 Leaders driving DevOps transformations will talk about
 The business problem they set out to solve
 The obstacles they had to overcome
 The business value they created
 Submit talks at: http://devopsenterprisesummit.com/
@RealGeneKim
Our Mission: Positively Impact The Lives
Of One Million IT Professionals By 2017
 Free 170 page excerpt:
http://itrevolution.com/the-phoenix-project-excerpt/
 http://slideshare.net/realgenekim
 DevOps Defensive Audit Toolkit:
http://http://bit.ly/DevOpsAudit
 Early draft of upcoming “DevOps Cookbook”
(Allspaw, DeBois, Edwards, Humble, Kim, Willis)
 Email me at genek@realgenekim.me
@RealGeneKim
Our Mission: Positively Impact The Lives
Of One Million IT Professionals By 2017
 Free 170 page excerpt:
http://itrevolution.com/the-phoenix-project-excerpt/
 http://slideshare.net/realgenekim
 DevOps Defensive Audit Toolkit:
http://http://bit.ly/DevOpsAudit
 Early draft of upcoming “DevOps Cookbook”
(Allspaw, DeBois, Edwards, Humble, Kim, Willis)
 Email me at genek@realgenekim.me

Docker and Devops

  • 1.
    @RealGeneKim Session ID: Gene Kim DockerAnd DevOps: My Fifteen Year Journey Studying High Performing IT Organizations
  • 2.
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
    @RealGeneKim Google, Amazon, Netflix, Spotify,Etsy, Spotify, Twitter, Facebook…
  • 9.
    @RealGeneKimSource: John Allspaw 10deploys per day Dev & ops cooperation at Flickr John Allspaw & Paul Hammond Velocity 2009
  • 10.
  • 11.
  • 12.
    @RealGeneKim Making Changes WhenIt Matters Most “By installing a rampant innovation culture, we performed 165 experiments in the peak three months of tax season.” –Scott Cook, Intuit Founder “Our business result? Conversion rate of the website is up 50 percent. Employee result? Everyone loves it, because now their ideas can make it to market.”
  • 13.
    @RealGeneKim Who Is DoingDevOps?  Google, Amazon, Netflix, Etsy, Spotify, Twitter, Facebook …  CSC, IBM, CA, SAP, HP, Microsoft, Red Hat …  GE Capital, Nationwide, BNP Paribas, BNY Mellon, World Bank, Paychex, Intuit …  The Gap, Nordstrom, Macy’s, Williams-Sonoma, Target …  General Motors, Northrup Grumman, LEGO, Bosche …  UK Government, US Department of Homeland Security …  Kansas State University… Who else?
  • 14.
    @RealGeneKim High Performing DevOpsTeams  They’re more agile  30x more frequent deployments  8,000x faster lead time than their peers  They’re more reliable  2x the change success rate  12x faster MTTR Source: Puppet Labs 2012 State Of DevOps: http://puppetlabs.com/2013-state-of-devops-infographic
  • 15.
    @RealGeneKim Organizations with highperforming DevOps organizations were 2.5x more likely to exceed profitability, market share and productivity goals… …and had 50% higher market capitalization growth over 3 years… Source: Puppet Labs 2014 State Of DevOps
  • 16.
  • 17.
  • 18.
  • 19.
    @RealGeneKim “This book willhave a profound effect on IT, just as The Goal did for manufacturing.” –Jez Humble, co-author Continuous Delivery “This is the IT swamp draining manual for anyone who is neck deep in alligators.” –Adrian Cockroft, Cloud Architect at Netflix “This is The Goal for our decade, and is for any IT professional who wants their life back.” –Charles Betz, IT architect, author “Architecture and Patterns for IT”
  • 20.
  • 21.
  • 22.
    @RealGeneKim “What is yourlead time for changes?” “How long does it take to go from code committed to code successfully running in production?”
  • 23.
  • 24.
  • 25.
    @RealGeneKim Create One StepEnvironment Creation Process  Make environments available early in the Development process  Make sure Dev builds the code and environment at the same time  Create a common Dev, QA and Production environment creation process
  • 26.
    @RealGeneKim If I hada magic wand, I’d change the Agile sprints and definition of “done”: “At the end of each sprint, we must have working and shippable code… demonstrated in an environment that resembles production.”
  • 27.
    @RealGeneKim Deploy Smaller Changes,More Frequently * Source: http://www.facebook.com/note.php?note_id=14218138919
  • 28.
    @RealGeneKim Deploy Smaller Changes,More Frequently *  Decouple feature releases from code deployments  Deploy features in a disabled state, using feature flags  Require all developers check code into trunk daily (at least)  Practice deploying smaller changes, which dramatically reduces risk and improves MTTR
  • 29.
    @RealGeneKim Experiment: Reducing BatchSize By 50% Source: Scott Prugh, Chief Architect, CSG, Inc. And the customer got the feature in half the time!
  • 30.
    @RealGeneKim Breaking The BottlenecksIn The Flow  Environment creation  Code deployment  Test setup and run (mention @rohansingh)  Overly tight architecture  Development  Product management
  • 31.
    @RealGeneKim “In November 2011,running even the most minimal test for CloudFoundry required deploying to 45 virtual machines, which took a half hour. This was way too long, and also prevented developers from testing on their own workstations. By using containers, within months, we got it down to 18 virtual machines so that any developer can deploy the entire system to single VM in six minutes.” — Elisabeth Hendrickson, Director of Quality Engineering, Pivotal Labs
  • 32.
  • 33.
  • 34.
    @RealGeneKim Top Predictors OfIT Performance  Version control of all production artifacts  Continuous integration and deployment  Automated acceptance testing  Peer-review of production changes (vs. external change approval)  High trust culture  Proactive monitoring of the production environment  Win-win relationship between Dev and Ops
  • 35.
    @RealGeneKim The First Way:Outcomes  Creating single repository for code and environments  Determinism in the release process  Consistent Dev, Test and Production environments, all properly built before deployment begins  Features being deployed daily without catastrophic failures  Decreased lead time  Faster cycle time and release cadence
  • 36.
  • 37.
  • 38.
    @RealGeneKim How many timesper day is the andon cord pulled in a typical day at a Toyota manufacturing plant? 3500 times per day
  • 39.
    @RealGeneKim Why would Toyotado something so disruptive as stopping production thousands of times per day? “It’s the only way we can build 2,000 vehicles per day – that’s one completed vehicle every 55 seconds.”
  • 40.
    @RealGeneKim "Automated tests transformfear into boredom." -- Eran Messeri, Google Google Dev And Ops (2013)  15,000 engineers, working on 4,000+ projects  All code is checked into one source tree (billions of files!)  5,500 code commits/day  75 million test cases are run daily
  • 41.
    @RealGeneKim Developers Carry Pagers “Wefound that when we woke up developers at 2am, defects got fixed faster than ever” – Patrick Lightbody, CEO, BrowserMob “You build it, you run it.” – Werner Vogels CTO, Amazon
  • 42.
    @RealGeneKim Developers Carry Pagers “Asa developer, there has never been a more satisfying point in my career than when I wrote the code, I pushed the button to deploy it, I watched the metrics to see if it actually worked in production, and fixed it if it broke.” – Tim Tischler Director of Operations Engr, Nike, Inc.
  • 43.
    @RealGeneKim Pervasive Production Telemetry “Havinga developer add a monitoring metric shouldn’t feel like a schema change.” – John Allspaw, SVP Tech Ops, Etsy
  • 44.
  • 45.
    @RealGeneKim One Of TheHighest Predictors Of Performance
  • 46.
    @RealGeneKim One Of TheHighest Predictors Of Performance
  • 47.
    @RealGeneKim Top Predictors OfIT Performance  Version control of all production artifacts  Continuous integration and deployment  Automated acceptance testing  Peer-review of production changes (vs. external change approval)  High trust culture  Proactive monitoring of the production environment  Win-win relationship between Dev and Ops
  • 48.
    @RealGeneKim The Second Way:Outcomes  Defects and security issues getting fixed faster than ever  Disciplined automated testing enabling many simultaneous small, agile teams to work productively  All groups communicating and coordinating better  Everybody is getting more work done
  • 49.
    @RealGeneKim The Third Way: ContinualExperimentation And Learning
  • 50.
    @RealGeneKim You Don’t ChooseChaos Monkey… Chaos Monkey Chooses You
  • 51.
    @RealGeneKim Allocate 20% OfCycles To Technical Debt Reduction
  • 52.
    @RealGeneKim “By November 2011,Kevin Scott, LinkedIn’s top engineer, had had enough. The system was taxed as LinkedIn attracted more users, and engineers were burnt out. “To fix the problems, Scott, who’d arrived from Google that February, launched Operation InVersion. “He froze development on new features so engineers could overhaul the computing architecture. “`We had to tell management we’re not going to deliver anything new while all of engineering works on this project for the next two months,’ Scott says. “It was a scary thing.’”
  • 53.
  • 54.
  • 55.
  • 56.
    @RealGeneKim Why Do IThink This Is Important?
  • 57.
  • 58.
  • 59.
    @RealGeneKim Opportunity Cost Of WastedIT Spending? $2,600,000,000,000.00 per year ($2.6 Trillion US)
  • 60.
    @RealGeneKim DevOps Enterprise Summit Save the date: October 21-23, 2014  DevOps Enterprise is a conference for horses, by horses  Macy’s, Disney, GE Capital, Blackboard, Telstra, US Department of Homeland Security, CSG  Leaders driving DevOps transformations will talk about  The business problem they set out to solve  The obstacles they had to overcome  The business value they created  Submit talks at: http://devopsenterprisesummit.com/
  • 61.
    @RealGeneKim Our Mission: PositivelyImpact The Lives Of One Million IT Professionals By 2017  Free 170 page excerpt: http://itrevolution.com/the-phoenix-project-excerpt/  http://slideshare.net/realgenekim  DevOps Defensive Audit Toolkit: http://http://bit.ly/DevOpsAudit  Early draft of upcoming “DevOps Cookbook” (Allspaw, DeBois, Edwards, Humble, Kim, Willis)  Email me at genek@realgenekim.me
  • 62.
    @RealGeneKim Our Mission: PositivelyImpact The Lives Of One Million IT Professionals By 2017  Free 170 page excerpt: http://itrevolution.com/the-phoenix-project-excerpt/  http://slideshare.net/realgenekim  DevOps Defensive Audit Toolkit: http://http://bit.ly/DevOpsAudit  Early draft of upcoming “DevOps Cookbook” (Allspaw, DeBois, Edwards, Humble, Kim, Willis)  Email me at genek@realgenekim.me

Editor's Notes

  • #2 My name is Gene Kim. My area of passion started when I was the CTO and founder of Tripwire in 1999. I started keeping a list that we called “Gene’s list of people with great kung fu.” These were the organizations that simutaneously… In the next 25 minutes, I’m really excited to share with you some of my key learnings, which I’m hoping that will not only be applicable to you, but that you’ll be able to put into practice right away, and get some amazing results. But let me tell you how my journey began…
  • #4 [ picture of messy data center ] Ten minutes into Bill’s first day on the job, he has to deal with a payroll run failure. Tomorrow is payday, and finance just found out that while all the salaried employees are going to get paid, none of the hourly factory employees will. All their records from the factory timekeeping systems were zeroed out. Was it a SAN failure? A database failure? An application failure? Interface failure? Cabling error?
  • #6 Source: http://biobreak.wordpress.com/2010/10/07/games-evangelism-dos-and-donts/
  • #11 There are many ways to react to this: like, fear, horror, trying to become invisible… All understandable, given the circumstances… Because infosec can no longer take 4 weeks to turn around a security review for application code, or take 6 weeks to turnaround a firewall change. But, on the other hand, I think it’s will be the best thing to ever happen to infosec in the past 20 years. We’re calling this Rugged DevOps, because it’s a way for infosec to integrate into the DevOps process, and be welcomed. And not be viewed as the shrill hysterical folks who slow the business down.
  • #19 Tell story of Amazon, Netflix: they care about, availability, security It’s not a push, it’s a pull – they’re looking for our help (#1 concern: fear of disintermediation and being marginalized)
  • #67 [ picture of messy data center ] Ten minutes into Bill’s first day on the job, he has to deal with a payroll run failure. Tomorrow is payday, and finance just found out that while all the salaried employees are going to get paid, none of the hourly factory employees will. All their records from the factory timekeeping systems were zeroed out. Was it a SAN failure? A database failure? An application failure? Interface failure? Cabling error?