© 2015 IBM Corporation
Sanjeev Sharma
CTO – DevOps Adoption
IBM Distinguished Engineer
@sd_architect | sdarchitect.blog
From Apollo 13 to
Google SRE
When DevOps met SRE
2Page© 2016 IBM Corporation
#WhoAmI
• 20+ Years in Software
Development and Delivery
• IBM Distinguished Engineer and
CTO for DevOps Adoption
• Author of two DevOps books:
• DevOps For Dummies:
https://ibm.biz/BdsPMX
• The DevOps Adoption Playbook:
http://amzn.to/2hH7rt2
• Blog: https://sdarchitect.blog
• @sd_architect
3Page© 2016 IBM Corporation
What is SRE?
“SRE is what happens
when you ask a software
engineer to design an
operations team. ”
- Betsy Beyer, Chris Jones, Jennifer Petoff,
and Niall Richard Murphy.
“Site Reliability Engineering.”
Site Reliability Engineering (SRE) :
Google’s approach to Service Management
4Page© 2016 IBM Corporation
Apollo 13 – The real heroes
Image Courtesy:
Universal Pictures, NASA
5Page© 2016 IBM Corporation
Reliability: The Real Availability Numbers!
How much downtime does 5-nines 99.999% availability translate to?
• Daily: 0.9s
• Weekly: 6.0s
• Monthly: 26.3s
• Yearly: 5m 15.6s
4-nines or 99.99% translates to downtime of:
• Daily: 8.6s
• Weekly: 1m 0.5s
• Monthly: 4m 23.0s
• Yearly: 52m 35.7s
Even the more common
99.95% availability SLO is
a mere 43 seconds/day or
5:24 minutes/week.
6Page© 2016 IBM Corporation
Eight Tenets of Google SRE
1. Ensuring a Durable Focus on Engineering
2. Pursuing Maximum Change Velocity Without Violating a Service’s SLO
3. Monitoring
4. Emergency Response
5. Change Management
6. Demand Forecasting and Capacity Planning
7. Provisioning
8. Efficiency and Performance
7Page© 2016 IBM Corporation
Best Practices of Incident Management
1. Prioritize
2. Prepare
3. Trust
4. Introspect
5. Consider alternatives
6. Practice
7. Change it around
Image Courtesy:
Universal Pictures, NASA
8Page© 2016 IBM Corporation
Development SCM Build Package
Repo
Deploy
Development SCM Build Package
Repo
Deploy
Development SCM Build Package
Repo
Deploy
Development SCM Build Package
Repo
Deploy Test Stage Production Mainframe Hosted App
Mobile App
App Server Monolithic App
Cloud Native App
Enterprise
Release
Agile/Innovation Edge
Rapid Delivery for Innovation • Agile • Antifragile • Experimentation • New and Innovative • Hybrid Cloud • IaaS/PaaS • Containers
Industrialized Core
Deliver at regular cadence • Agile • Stability • Predictability • Lean Delivery pipeline • Core and Legacy Systems
Hybrid Infrastructure – Physical, Cloud • IaaS/PaaS • Containers
Business
Capability
DevOps + SRE in the Enterprise
Balancing Innovation and Optimization
9Page© 2016 IBM Corporation
Development SCM Build Package
Repo
Deploy
Development SCM Build Package
Repo
Deploy
Development SCM Build Package
Repo
Deploy
Development SCM Build Package
Repo
Deploy Test Stage Production
Application N
Application C
Application B
Application A
Enterprise
Release
Agile/Innovation Edge
Rapid Delivery for Innovation • Agile • Antifragile • Experimentation • New and Innovative • Hybrid Cloud • IaaS/PaaS • Containers
Industrialized Core
Deliver at regular cadence • Agile • Stability • Predictability • Lean Delivery pipeline • Core and Legacy Systems
Hybrid Infrastructure – Physical, Cloud • IaaS/PaaS • Containers
Business
Capability
Touchpoints of Standardization Across Delivery Pipelines
Deployment
Automation and
Orchestration
Service and Test
Environment
Virtualization
APIs
Planning and
Architecture
Release
Management
Operational
Readiness
10Page© 2016 IBM Corporation
Development SCM Build Package
Repo
Deploy
Development SCM Build Package
Repo
Deploy
Development SCM Build Package
Repo
Deploy
Development SCM Build Package
Repo
Deploy Test Stage Production
Application N
Application C
Application B
Application A
Enterprise
Release
Agile/Innovation Edge
Rapid Delivery for Innovation • Agile • Antifragile • Experimentation • New and Innovative • Hybrid Cloud • IaaS/PaaS • Containers
Industrialized Core
Deliver at regular cadence • Agile • Stability • Predictability • Lean Delivery pipeline • Core and Legacy Systems
Hybrid Infrastructure – Physical, Cloud • IaaS/PaaS • Containers
Business
Capability
When DevOps met SRE
Deployment
Automation and
Orchestration
Service and Test
Environment
Virtualization
APIs
Planning and
Architecture
Release
Management
Operational
Readiness
DevOps
SRE
11Page© 2016 IBM Corporation
Your Delivery Pipeline
will be as fast as the
slowest Delivery
Pipeline it is
dependent on
Architecture and Planning
12Page© 2016 IBM Corporation
Modernizing to
Microservices based
Architecture:
Refactoring Code
and Data and
defining REST APIs
APIs
13Page© 2016 IBM Corporation
Developers are paid
to write code, not
maintain deployment
and configuration
scripts
Application Deployment and Environment
Orchestration
14Page© 2016 IBM Corporation
If you are doing 2-
week Sprints, but it
takes 3-weeks to
get a Test Server,
how long are your
Sprints?
Test Service and Environment Virtualization
15Page© 2016 IBM Corporation
It is not possible to
patch the software of
a missile AFTER it
has been launched
Release Management
16Page© 2016 IBM Corporation
Shift thinking from
Mean Time Between
Failure (MTBF) to
Mean Time To
Repair (MTTR).
Operational Readiness for SRE
17Page© 2016 IBM Corporation
MTTR Calculus
Mean Time to Repair =
Mean Time to Detect + Mean Time to Triage +
Mean Time to Restore
+ Mean Time to Pass Blame…
18Page© 2016 IBM Corporation
Antifragile Systems
Antifragile: Things that are
neither fragile or robust,
but rather thrive in chaos.
19Page© 2016 IBM Corporation
Delivering Antifragile Systems
Servers may go “red,”
services are always
“green”
Cattle not pets
Fragility in systems actually
comes from a desire to make
them too robust.
20Page© 2016 IBM Corporation
Organizational Change
• “Everyone is responsible for
Delivering to Production”
• Squad-Tribe-Guild Team Model
• SRE Squads
• A Learning Organization
21Page© 2016 IBM Corporation
When DevOps meets SRE
DevOps: “Everyone is responsible for
delivery to production.”
SRE: “(Everyone) is responsible for
delivering Continuous Business Value”
© 2015 IBM Corporation
Any questions?
THANK YOU
@sd_architect
http://sdarchitect.blog

From Apollo 13 to Google SRE

  • 1.
    © 2015 IBMCorporation Sanjeev Sharma CTO – DevOps Adoption IBM Distinguished Engineer @sd_architect | sdarchitect.blog From Apollo 13 to Google SRE When DevOps met SRE
  • 2.
    2Page© 2016 IBMCorporation #WhoAmI • 20+ Years in Software Development and Delivery • IBM Distinguished Engineer and CTO for DevOps Adoption • Author of two DevOps books: • DevOps For Dummies: https://ibm.biz/BdsPMX • The DevOps Adoption Playbook: http://amzn.to/2hH7rt2 • Blog: https://sdarchitect.blog • @sd_architect
  • 3.
    3Page© 2016 IBMCorporation What is SRE? “SRE is what happens when you ask a software engineer to design an operations team. ” - Betsy Beyer, Chris Jones, Jennifer Petoff, and Niall Richard Murphy. “Site Reliability Engineering.” Site Reliability Engineering (SRE) : Google’s approach to Service Management
  • 4.
    4Page© 2016 IBMCorporation Apollo 13 – The real heroes Image Courtesy: Universal Pictures, NASA
  • 5.
    5Page© 2016 IBMCorporation Reliability: The Real Availability Numbers! How much downtime does 5-nines 99.999% availability translate to? • Daily: 0.9s • Weekly: 6.0s • Monthly: 26.3s • Yearly: 5m 15.6s 4-nines or 99.99% translates to downtime of: • Daily: 8.6s • Weekly: 1m 0.5s • Monthly: 4m 23.0s • Yearly: 52m 35.7s Even the more common 99.95% availability SLO is a mere 43 seconds/day or 5:24 minutes/week.
  • 6.
    6Page© 2016 IBMCorporation Eight Tenets of Google SRE 1. Ensuring a Durable Focus on Engineering 2. Pursuing Maximum Change Velocity Without Violating a Service’s SLO 3. Monitoring 4. Emergency Response 5. Change Management 6. Demand Forecasting and Capacity Planning 7. Provisioning 8. Efficiency and Performance
  • 7.
    7Page© 2016 IBMCorporation Best Practices of Incident Management 1. Prioritize 2. Prepare 3. Trust 4. Introspect 5. Consider alternatives 6. Practice 7. Change it around Image Courtesy: Universal Pictures, NASA
  • 8.
    8Page© 2016 IBMCorporation Development SCM Build Package Repo Deploy Development SCM Build Package Repo Deploy Development SCM Build Package Repo Deploy Development SCM Build Package Repo Deploy Test Stage Production Mainframe Hosted App Mobile App App Server Monolithic App Cloud Native App Enterprise Release Agile/Innovation Edge Rapid Delivery for Innovation • Agile • Antifragile • Experimentation • New and Innovative • Hybrid Cloud • IaaS/PaaS • Containers Industrialized Core Deliver at regular cadence • Agile • Stability • Predictability • Lean Delivery pipeline • Core and Legacy Systems Hybrid Infrastructure – Physical, Cloud • IaaS/PaaS • Containers Business Capability DevOps + SRE in the Enterprise Balancing Innovation and Optimization
  • 9.
    9Page© 2016 IBMCorporation Development SCM Build Package Repo Deploy Development SCM Build Package Repo Deploy Development SCM Build Package Repo Deploy Development SCM Build Package Repo Deploy Test Stage Production Application N Application C Application B Application A Enterprise Release Agile/Innovation Edge Rapid Delivery for Innovation • Agile • Antifragile • Experimentation • New and Innovative • Hybrid Cloud • IaaS/PaaS • Containers Industrialized Core Deliver at regular cadence • Agile • Stability • Predictability • Lean Delivery pipeline • Core and Legacy Systems Hybrid Infrastructure – Physical, Cloud • IaaS/PaaS • Containers Business Capability Touchpoints of Standardization Across Delivery Pipelines Deployment Automation and Orchestration Service and Test Environment Virtualization APIs Planning and Architecture Release Management Operational Readiness
  • 10.
    10Page© 2016 IBMCorporation Development SCM Build Package Repo Deploy Development SCM Build Package Repo Deploy Development SCM Build Package Repo Deploy Development SCM Build Package Repo Deploy Test Stage Production Application N Application C Application B Application A Enterprise Release Agile/Innovation Edge Rapid Delivery for Innovation • Agile • Antifragile • Experimentation • New and Innovative • Hybrid Cloud • IaaS/PaaS • Containers Industrialized Core Deliver at regular cadence • Agile • Stability • Predictability • Lean Delivery pipeline • Core and Legacy Systems Hybrid Infrastructure – Physical, Cloud • IaaS/PaaS • Containers Business Capability When DevOps met SRE Deployment Automation and Orchestration Service and Test Environment Virtualization APIs Planning and Architecture Release Management Operational Readiness DevOps SRE
  • 11.
    11Page© 2016 IBMCorporation Your Delivery Pipeline will be as fast as the slowest Delivery Pipeline it is dependent on Architecture and Planning
  • 12.
    12Page© 2016 IBMCorporation Modernizing to Microservices based Architecture: Refactoring Code and Data and defining REST APIs APIs
  • 13.
    13Page© 2016 IBMCorporation Developers are paid to write code, not maintain deployment and configuration scripts Application Deployment and Environment Orchestration
  • 14.
    14Page© 2016 IBMCorporation If you are doing 2- week Sprints, but it takes 3-weeks to get a Test Server, how long are your Sprints? Test Service and Environment Virtualization
  • 15.
    15Page© 2016 IBMCorporation It is not possible to patch the software of a missile AFTER it has been launched Release Management
  • 16.
    16Page© 2016 IBMCorporation Shift thinking from Mean Time Between Failure (MTBF) to Mean Time To Repair (MTTR). Operational Readiness for SRE
  • 17.
    17Page© 2016 IBMCorporation MTTR Calculus Mean Time to Repair = Mean Time to Detect + Mean Time to Triage + Mean Time to Restore + Mean Time to Pass Blame…
  • 18.
    18Page© 2016 IBMCorporation Antifragile Systems Antifragile: Things that are neither fragile or robust, but rather thrive in chaos.
  • 19.
    19Page© 2016 IBMCorporation Delivering Antifragile Systems Servers may go “red,” services are always “green” Cattle not pets Fragility in systems actually comes from a desire to make them too robust.
  • 20.
    20Page© 2016 IBMCorporation Organizational Change • “Everyone is responsible for Delivering to Production” • Squad-Tribe-Guild Team Model • SRE Squads • A Learning Organization
  • 21.
    21Page© 2016 IBMCorporation When DevOps meets SRE DevOps: “Everyone is responsible for delivery to production.” SRE: “(Everyone) is responsible for delivering Continuous Business Value”
  • 22.
    © 2015 IBMCorporation Any questions? THANK YOU @sd_architect http://sdarchitect.blog