SlideShare a Scribd company logo
Site Reliability Engineering:
Harnessing (and redefining) it for ITSM
Principal Product Manager (BMC Software) and ITIL® 4 Contributing Author
“Service Management Post Corona” virtual event: 10th June 2020
Jon Stevens-Hall
A brief introduction to SRE
• First published 2016
• Free online: https://landing.google.com/sre/books/
“SRE is what happens when you ask a software engineer
to design an operations team”
• Some SREs at Google are software engineers hired in
the standard way
• Others were close to being recruited as a software
engineer, and brought other infra skills
• SREs help software developers run their services but
also expect accountability from them.
@jonstevenshall
Me
• ITSM veteran – 23 years
• Principal Product Manager with BMC software, for
our enterprise ITSM toolset, BMC Helix ITSM
• ITIL® 4 contributing author
• Strong focus on DevOps and new ways of working
• Asked by SDI to talk about SRE today
Twitter: @JonStevensHall
• https://www.bmc.com/blogs/author/jonhall/
• https://medium.com/@jonhall_
@jonstevenshall
Eliminating Toil
Subjects covered in the SRE handbook
Risk
Service Level
Objectives
Monitoring
Distributed Systems
Automation
Release Engineering
Alerting
On-Call
Handling Incidents
Postmortem SchedulingHuman Welfare
Communication
Risk
Service Level
Objectives
Eliminating Toil
@jonstevenshall
Defining Toil
AutomatableRepetitiveManual Tactical
Devoid of
enduring
value
Linearly
scaling
• SRE teams can only spend a maximum of 50% on work
considered to be “Toil”.
• Google’s SRE handbook describes “toil” as work which is:
@jonstevenshall
Service Level Objective
Availability =
Successful requests
Total requests
Availability =
Uptime
Uptime + Downtime
SLOs are not SLAs. Not all Google services have SLAs, but some do.
SLOs define appropriate technical bounds for service performance.
“99.9% of RPC calls will
complete in less than 100 ms"
“The service will be available
99.99% of the time”
@jonstevenshall
Error Budget
Defines how unreliable the service is allowed to be,
within a given time period.
100%
Required performance (SLO target)
Remaining error budget
Outages to-date
“An outage is no longer a bad thing.
It is an expected part of the process of innovation,
and an occurrence that development and SRE teams
manage rather than fear”
Outages are okay, within error budget tolerances
@jonstevenshall
Error Budget
99.9% SLO 0.05% outages already
Error budget remaining is 0.05%
Hence an additional outage which consumes 0.02% of the quarter’s
availability would cost 40% of the remaining error budget.
Balancing innovation and reliability
• Large budget: developers can take
more risks
• Budget nearly drained: developers tend
to slow down
• Budget used up: releases temporarily
halted
Error Budget shapes ways of working
Martin Vorel via Libreshot
@jonstevenshall
How does SRE align to ITSM as we know it?
“Site Reliability Engineering (SRE) is an
emerging IT service management
(ITSM) framework”
(First line of the report, published 1st April 2020)
@jonstevenshall
Will SRE replace ITSM as we know it?
“As a service management alternative, SRE updates
traditional ITSM activities with innovative and self-
organizing concepts such as…
• Management to service level objectives
• Error budgets
• Toil reduction
• Release engineering
• Monitoring/observability
• Embracing risk”
Jayne Groll (@JayneGroll), DevOps.com, 2020
@jonstevenshall
A counterpoint: Most of us are not Google.
“If you think spinning up an ops team and calling it
SRE is ‘an implementation of DevOps’ you’ve
swallowed the worst poison pill the DevOps talk
circuit can deal to you.”
• Unless you are Google scale, a separate
production operations team is inefficient
• DevOps showed that “throw over the wall
from Dev to Ops” was wrong
• Throwing over the wall from Dev to SRE
doesn’t improve that
The Agile Admin – Blog (2018).
@jonstevenshall
One size doesn’t fit all
“Toyota opens its factory doors to us again and
again, but I imagine Toyota’s leaders may also be
shaking their heads and thinking, ‘Sure, come
have a look. But why are you so interested in the
solutions we develop for our specific problems?
Why do you never study how we go about
developing those solutions?’”
Mike Rother – Toyota Kata (2009)
@jonstevenshall
• “Site Reliability Engineering”
• In most cases, IT services aren’t “sites”
• A lot of ITSM work isn’t software engineering
• Reliability is a broad term
• Also, it’s not easy to apply directly to commercial off-the-shelf
software (especially SaaS) which underpins many IT services.
© Copyright 2020 BMC Software, Inc.
What’s in a name? Rethinking SRE for ITSM
SRE is already an established profession
Job advert for ”disruptive Fintech”, retrieved May 2020
@jonstevenshall
ITIL® 4 maps SRE principles to 14 practices
• Availability management
• Capacity and performance management
• Change enablement
• Incident management
• Infrastructure and platform management
• Monitoring and event management
• Problem management
• Service design
• Software development and management
• Deployment management
• Organizational change management
• Release management
• Service configuration management
• Service validation and testing
(but at a very high level)
@jonstevenshall
• Current service desk performance: 98% call pickup, on a target of 90%
• Agents fully utilised on calls
Example of an “SRE-like” change for IT support
Before:
• Schedule a proportion of each agent’s time to be away from calls
• Use that time for proactive reduction of toil (e.g. building self service
offerings, developing knowledge content)
• The previous 8% over-performance in call pickup is spent as “error budget”
Applying SRE thinking:
@jonstevenshall
• Change approval interventions determined by perceived risk of change type
• Pressure from DevOps to remove approval barriers (particularly CAB)
SRE applied to Change Enablement
Before:
• Dynamically assess approval level based on confidence level in team
• Implementation success rate of previous changes
• Issues caused by previous changes
• Adherence to guardrails in the DevOps toolchain
• Minimal intervention, applied more smartly
Applying SRE thinking:
@jonstevenshall
© Copyright 2020 BMC Software, Inc.
Example of Change Enablement with SRE principles
Example:
• Two development teams
with different track record
(Dev Group 2 – higher initial
confidence than Dev Group 1)
• Dynamically adjust to
changing success rates
(Watch what happens after a
Dev Group 2 failure)
• Site Reliability Engineering is a bad name for many aspects of ITSM.
• SRE is already defined (and recruited) as a specialist operations
developer role.
• Some key, novel themes have great potential but need redefinition
for much of ITSM.
Summary of issues
@jonstevenshall
• “Rebrand” the SRE concept for ITSM: New name, new thinking.
• Identify new ways to apply key novel principles such as reliability and toil
reduction.
• Invest in ITSM initiatives such as KCS which share good principles with SRE.
Proposal
@jonstevenshall
© Copyright 2020 BMC Software, Inc.
Referenced materials and further reading
https://theagileadmin.com/2018/10/02/sre-
the-biggest-lie-since-kanban/
https://devops.com/sre-is-the-most-
innovative-approach-to-itsm-since-itil/
@ernestmueller @JayneGroll @RealMikeRother
http://www-personal.umich.edu/
~mrother/Homepage.html
https://www.bmc.com/blogs
/sre-vs-itops/
@DevDroid9
https://medium.com/@JonHall_/what-can-site-reliability-
engineers-teach-service-desks-about-toil-e1df38ca284e
@jonstevenshall
https://landing.google.com/sre/books/
@srebook
Thank you!

More Related Content

What's hot

DevOpsDays Riga - Swarming Presentation
DevOpsDays Riga - Swarming PresentationDevOpsDays Riga - Swarming Presentation
DevOpsDays Riga - Swarming Presentation
Jon Stevens-Hall
 
ITSM, Swarming and Devops
ITSM, Swarming and DevopsITSM, Swarming and Devops
ITSM, Swarming and Devops
Jon Stevens-Hall
 
Devopsdays Edinburgh 2017 - Ignite talk - Swarming
Devopsdays Edinburgh 2017 - Ignite talk - SwarmingDevopsdays Edinburgh 2017 - Ignite talk - Swarming
Devopsdays Edinburgh 2017 - Ignite talk - Swarming
Jon Stevens-Hall
 
Devops In The Enterprise: How Swarming Can Fix The Problem Of Becoming A 3rd-...
Devops In The Enterprise:How Swarming Can Fix The Problem Of Becoming A 3rd-...Devops In The Enterprise:How Swarming Can Fix The Problem Of Becoming A 3rd-...
Devops In The Enterprise: How Swarming Can Fix The Problem Of Becoming A 3rd-...
Jon Stevens-Hall
 
Swarming: How a new approach to support can save DevOps teams from 3rd-line t...
Swarming: How a new approach to support can save DevOps teams from 3rd-line t...Swarming: How a new approach to support can save DevOps teams from 3rd-line t...
Swarming: How a new approach to support can save DevOps teams from 3rd-line t...
Jon Stevens-Hall
 
BMC Engage 2015: Optimizing Service Desk Interactions with Knowledge Management
BMC Engage 2015: Optimizing Service Desk Interactions with Knowledge ManagementBMC Engage 2015: Optimizing Service Desk Interactions with Knowledge Management
BMC Engage 2015: Optimizing Service Desk Interactions with Knowledge Management
Jon Stevens-Hall
 
Brighttalk high scale low touch and other bedtime stories - final
Brighttalk   high scale low touch and other bedtime stories - finalBrighttalk   high scale low touch and other bedtime stories - final
Brighttalk high scale low touch and other bedtime stories - final
Andrew White
 
Brighttalk converged infrastructure and it operations management - final
Brighttalk   converged infrastructure and it operations management - finalBrighttalk   converged infrastructure and it operations management - final
Brighttalk converged infrastructure and it operations management - final
Andrew White
 
Brighttalk understanding the promise of sde - final
Brighttalk   understanding the promise of sde - finalBrighttalk   understanding the promise of sde - final
Brighttalk understanding the promise of sde - final
Andrew White
 
Brighttalk brining it all together - final
Brighttalk   brining it all together - finalBrighttalk   brining it all together - final
Brighttalk brining it all together - final
Andrew White
 
Webinar: Machine learning analytics for immediate resolution to the most chal...
Webinar: Machine learning analytics for immediate resolution to the most chal...Webinar: Machine learning analytics for immediate resolution to the most chal...
Webinar: Machine learning analytics for immediate resolution to the most chal...
Melina Black
 
MTB03015USEN.PDF
MTB03015USEN.PDFMTB03015USEN.PDF
MTB03015USEN.PDF
Marisol Rawlins
 
How to improve your system monitoring
How to improve your system monitoringHow to improve your system monitoring
How to improve your system monitoring
Andrew White
 
Bimodal IT: Shortcut to Innovation or Path to Dysfunction?
Bimodal IT: Shortcut to Innovation or Path to Dysfunction?Bimodal IT: Shortcut to Innovation or Path to Dysfunction?
Bimodal IT: Shortcut to Innovation or Path to Dysfunction?
dev2ops
 
Bright talk running a cloud - final
Bright talk   running a cloud - finalBright talk   running a cloud - final
Bright talk running a cloud - final
Andrew White
 
World Wide Technology Webinar - Software Defined Networking
World Wide Technology Webinar - Software Defined NetworkingWorld Wide Technology Webinar - Software Defined Networking
World Wide Technology Webinar - Software Defined Networking
World Wide Technology
 
Brighttalk learning to cook- network management recipes - final
Brighttalk   learning to cook- network management recipes - finalBrighttalk   learning to cook- network management recipes - final
Brighttalk learning to cook- network management recipes - final
Andrew White
 
World Wide Technology Webinar Transcript - Software Defined Networking
World Wide Technology Webinar Transcript - Software Defined NetworkingWorld Wide Technology Webinar Transcript - Software Defined Networking
World Wide Technology Webinar Transcript - Software Defined Networking
World Wide Technology
 
Brighttalk getting back on track - final
Brighttalk   getting back on track - finalBrighttalk   getting back on track - final
Brighttalk getting back on track - final
Andrew White
 
Cloud Scars: Lessons from the Enterprise Pioneers
Cloud Scars: Lessons from the Enterprise PioneersCloud Scars: Lessons from the Enterprise Pioneers
Cloud Scars: Lessons from the Enterprise Pioneers
Dave Roberts
 

What's hot (20)

DevOpsDays Riga - Swarming Presentation
DevOpsDays Riga - Swarming PresentationDevOpsDays Riga - Swarming Presentation
DevOpsDays Riga - Swarming Presentation
 
ITSM, Swarming and Devops
ITSM, Swarming and DevopsITSM, Swarming and Devops
ITSM, Swarming and Devops
 
Devopsdays Edinburgh 2017 - Ignite talk - Swarming
Devopsdays Edinburgh 2017 - Ignite talk - SwarmingDevopsdays Edinburgh 2017 - Ignite talk - Swarming
Devopsdays Edinburgh 2017 - Ignite talk - Swarming
 
Devops In The Enterprise: How Swarming Can Fix The Problem Of Becoming A 3rd-...
Devops In The Enterprise:How Swarming Can Fix The Problem Of Becoming A 3rd-...Devops In The Enterprise:How Swarming Can Fix The Problem Of Becoming A 3rd-...
Devops In The Enterprise: How Swarming Can Fix The Problem Of Becoming A 3rd-...
 
Swarming: How a new approach to support can save DevOps teams from 3rd-line t...
Swarming: How a new approach to support can save DevOps teams from 3rd-line t...Swarming: How a new approach to support can save DevOps teams from 3rd-line t...
Swarming: How a new approach to support can save DevOps teams from 3rd-line t...
 
BMC Engage 2015: Optimizing Service Desk Interactions with Knowledge Management
BMC Engage 2015: Optimizing Service Desk Interactions with Knowledge ManagementBMC Engage 2015: Optimizing Service Desk Interactions with Knowledge Management
BMC Engage 2015: Optimizing Service Desk Interactions with Knowledge Management
 
Brighttalk high scale low touch and other bedtime stories - final
Brighttalk   high scale low touch and other bedtime stories - finalBrighttalk   high scale low touch and other bedtime stories - final
Brighttalk high scale low touch and other bedtime stories - final
 
Brighttalk converged infrastructure and it operations management - final
Brighttalk   converged infrastructure and it operations management - finalBrighttalk   converged infrastructure and it operations management - final
Brighttalk converged infrastructure and it operations management - final
 
Brighttalk understanding the promise of sde - final
Brighttalk   understanding the promise of sde - finalBrighttalk   understanding the promise of sde - final
Brighttalk understanding the promise of sde - final
 
Brighttalk brining it all together - final
Brighttalk   brining it all together - finalBrighttalk   brining it all together - final
Brighttalk brining it all together - final
 
Webinar: Machine learning analytics for immediate resolution to the most chal...
Webinar: Machine learning analytics for immediate resolution to the most chal...Webinar: Machine learning analytics for immediate resolution to the most chal...
Webinar: Machine learning analytics for immediate resolution to the most chal...
 
MTB03015USEN.PDF
MTB03015USEN.PDFMTB03015USEN.PDF
MTB03015USEN.PDF
 
How to improve your system monitoring
How to improve your system monitoringHow to improve your system monitoring
How to improve your system monitoring
 
Bimodal IT: Shortcut to Innovation or Path to Dysfunction?
Bimodal IT: Shortcut to Innovation or Path to Dysfunction?Bimodal IT: Shortcut to Innovation or Path to Dysfunction?
Bimodal IT: Shortcut to Innovation or Path to Dysfunction?
 
Bright talk running a cloud - final
Bright talk   running a cloud - finalBright talk   running a cloud - final
Bright talk running a cloud - final
 
World Wide Technology Webinar - Software Defined Networking
World Wide Technology Webinar - Software Defined NetworkingWorld Wide Technology Webinar - Software Defined Networking
World Wide Technology Webinar - Software Defined Networking
 
Brighttalk learning to cook- network management recipes - final
Brighttalk   learning to cook- network management recipes - finalBrighttalk   learning to cook- network management recipes - final
Brighttalk learning to cook- network management recipes - final
 
World Wide Technology Webinar Transcript - Software Defined Networking
World Wide Technology Webinar Transcript - Software Defined NetworkingWorld Wide Technology Webinar Transcript - Software Defined Networking
World Wide Technology Webinar Transcript - Software Defined Networking
 
Brighttalk getting back on track - final
Brighttalk   getting back on track - finalBrighttalk   getting back on track - final
Brighttalk getting back on track - final
 
Cloud Scars: Lessons from the Enterprise Pioneers
Cloud Scars: Lessons from the Enterprise PioneersCloud Scars: Lessons from the Enterprise Pioneers
Cloud Scars: Lessons from the Enterprise Pioneers
 

Similar to Site Reliability Engineering: Harnessing (and redefining) it for ITSM

SRE (service reliability engineer) on big DevOps platform running on the clou...
SRE (service reliability engineer) on big DevOps platform running on the clou...SRE (service reliability engineer) on big DevOps platform running on the clou...
SRE (service reliability engineer) on big DevOps platform running on the clou...
DevClub_lv
 
Is DevOps Really Changing IT Support?
Is DevOps Really Changing IT Support?Is DevOps Really Changing IT Support?
Is DevOps Really Changing IT Support?
Jon Stevens-Hall
 
DBA Role Shift in a DevOps World
DBA Role Shift in a DevOps WorldDBA Role Shift in a DevOps World
DBA Role Shift in a DevOps World
Datavail
 
The Future of Change Management and DevOps for Dummies
The Future of Change Management and DevOps for DummiesThe Future of Change Management and DevOps for Dummies
The Future of Change Management and DevOps for Dummies
DBmaestro - Database DevOps
 
Galorath - IT Data Collection, Analysis and Benchmarking: From Processes and...
Galorath -  IT Data Collection, Analysis and Benchmarking: From Processes and...Galorath -  IT Data Collection, Analysis and Benchmarking: From Processes and...
Galorath - IT Data Collection, Analysis and Benchmarking: From Processes and...
International Software Benchmarking Standards Group (ISBSG)
 
Lownes_Unit9
Lownes_Unit9Lownes_Unit9
Lownes_Unit9
Warren Lownes
 
Creating Creating Service Systems
Creating Creating Service SystemsCreating Creating Service Systems
Culture is more important than competence in IT outsourcing
Culture is more important than competence in IT outsourcingCulture is more important than competence in IT outsourcing
Culture is more important than competence in IT outsourcing
BJIT Ltd
 
Devops intro
Devops introDevops intro
Devops intro
Pallavi Mudaliar
 
Un Architecture
Un ArchitectureUn Architecture
Un Architecture
chrisonea
 
Leveraging Failure to Succeed in DevOps
Leveraging Failure to Succeed in DevOpsLeveraging Failure to Succeed in DevOps
Leveraging Failure to Succeed in DevOps
Steve Brown
 
How to Drive Maximum Business Value from IT Investments with the Flow Framework
How to Drive Maximum Business Value from IT Investments with the Flow FrameworkHow to Drive Maximum Business Value from IT Investments with the Flow Framework
How to Drive Maximum Business Value from IT Investments with the Flow Framework
Tasktop
 
Microservices
MicroservicesMicroservices
Microservices
PT.JUG
 
Culture Is More Important Than Competence In IT.pptx
Culture Is More Important Than Competence In IT.pptxCulture Is More Important Than Competence In IT.pptx
Culture Is More Important Than Competence In IT.pptx
mushrunayasmin
 
Obsidian Agile DevOps
Obsidian Agile DevOpsObsidian Agile DevOps
Obsidian Agile DevOps
David A. Callner
 
DeveloperWeek 2020: Cloud-Native and Microservices Accelerate Process Improve...
DeveloperWeek 2020: Cloud-Native and Microservices Accelerate Process Improve...DeveloperWeek 2020: Cloud-Native and Microservices Accelerate Process Improve...
DeveloperWeek 2020: Cloud-Native and Microservices Accelerate Process Improve...
Shahir Daya
 
Feature Scoring in Green Field Application Development and DevOps
Feature Scoring in Green Field Application Development and DevOpsFeature Scoring in Green Field Application Development and DevOps
Feature Scoring in Green Field Application Development and DevOps
DevOps Indonesia
 
The Evolution of Application Release Automation
The Evolution of Application Release AutomationThe Evolution of Application Release Automation
The Evolution of Application Release Automation
Jules Pierre-Louis
 
Agile and SOA Comparing the Two
Agile and SOA Comparing the TwoAgile and SOA Comparing the Two
Agile and SOA Comparing the Two
Sally Elatta
 
Who needs EA… when we have DevOps?
Who needs EA… when we have DevOps?Who needs EA… when we have DevOps?
Who needs EA… when we have DevOps?
Jeff Jakubiak
 

Similar to Site Reliability Engineering: Harnessing (and redefining) it for ITSM (20)

SRE (service reliability engineer) on big DevOps platform running on the clou...
SRE (service reliability engineer) on big DevOps platform running on the clou...SRE (service reliability engineer) on big DevOps platform running on the clou...
SRE (service reliability engineer) on big DevOps platform running on the clou...
 
Is DevOps Really Changing IT Support?
Is DevOps Really Changing IT Support?Is DevOps Really Changing IT Support?
Is DevOps Really Changing IT Support?
 
DBA Role Shift in a DevOps World
DBA Role Shift in a DevOps WorldDBA Role Shift in a DevOps World
DBA Role Shift in a DevOps World
 
The Future of Change Management and DevOps for Dummies
The Future of Change Management and DevOps for DummiesThe Future of Change Management and DevOps for Dummies
The Future of Change Management and DevOps for Dummies
 
Galorath - IT Data Collection, Analysis and Benchmarking: From Processes and...
Galorath -  IT Data Collection, Analysis and Benchmarking: From Processes and...Galorath -  IT Data Collection, Analysis and Benchmarking: From Processes and...
Galorath - IT Data Collection, Analysis and Benchmarking: From Processes and...
 
Lownes_Unit9
Lownes_Unit9Lownes_Unit9
Lownes_Unit9
 
Creating Creating Service Systems
Creating Creating Service SystemsCreating Creating Service Systems
Creating Creating Service Systems
 
Culture is more important than competence in IT outsourcing
Culture is more important than competence in IT outsourcingCulture is more important than competence in IT outsourcing
Culture is more important than competence in IT outsourcing
 
Devops intro
Devops introDevops intro
Devops intro
 
Un Architecture
Un ArchitectureUn Architecture
Un Architecture
 
Leveraging Failure to Succeed in DevOps
Leveraging Failure to Succeed in DevOpsLeveraging Failure to Succeed in DevOps
Leveraging Failure to Succeed in DevOps
 
How to Drive Maximum Business Value from IT Investments with the Flow Framework
How to Drive Maximum Business Value from IT Investments with the Flow FrameworkHow to Drive Maximum Business Value from IT Investments with the Flow Framework
How to Drive Maximum Business Value from IT Investments with the Flow Framework
 
Microservices
MicroservicesMicroservices
Microservices
 
Culture Is More Important Than Competence In IT.pptx
Culture Is More Important Than Competence In IT.pptxCulture Is More Important Than Competence In IT.pptx
Culture Is More Important Than Competence In IT.pptx
 
Obsidian Agile DevOps
Obsidian Agile DevOpsObsidian Agile DevOps
Obsidian Agile DevOps
 
DeveloperWeek 2020: Cloud-Native and Microservices Accelerate Process Improve...
DeveloperWeek 2020: Cloud-Native and Microservices Accelerate Process Improve...DeveloperWeek 2020: Cloud-Native and Microservices Accelerate Process Improve...
DeveloperWeek 2020: Cloud-Native and Microservices Accelerate Process Improve...
 
Feature Scoring in Green Field Application Development and DevOps
Feature Scoring in Green Field Application Development and DevOpsFeature Scoring in Green Field Application Development and DevOps
Feature Scoring in Green Field Application Development and DevOps
 
The Evolution of Application Release Automation
The Evolution of Application Release AutomationThe Evolution of Application Release Automation
The Evolution of Application Release Automation
 
Agile and SOA Comparing the Two
Agile and SOA Comparing the TwoAgile and SOA Comparing the Two
Agile and SOA Comparing the Two
 
Who needs EA… when we have DevOps?
Who needs EA… when we have DevOps?Who needs EA… when we have DevOps?
Who needs EA… when we have DevOps?
 

More from Jon Stevens-Hall

DevOps Enterprise Summit Las Vegas 2018: The Problem of Becoming a 3rd-Line S...
DevOps Enterprise Summit Las Vegas 2018: The Problem of Becoming a 3rd-Line S...DevOps Enterprise Summit Las Vegas 2018: The Problem of Becoming a 3rd-Line S...
DevOps Enterprise Summit Las Vegas 2018: The Problem of Becoming a 3rd-Line S...
Jon Stevens-Hall
 
Service Manager Dag, Netherlands 2018: Why we should ditch the 3-tier support...
Service Manager Dag, Netherlands 2018: Why we should ditch the 3-tier support...Service Manager Dag, Netherlands 2018: Why we should ditch the 3-tier support...
Service Manager Dag, Netherlands 2018: Why we should ditch the 3-tier support...
Jon Stevens-Hall
 
Configuration Management Camp 2018: The problem of becoming "3rd line support...
Configuration Management Camp 2018: The problem of becoming "3rd line support...Configuration Management Camp 2018: The problem of becoming "3rd line support...
Configuration Management Camp 2018: The problem of becoming "3rd line support...
Jon Stevens-Hall
 
devopsdays Stockholm Ignite talk: Aligning DevOps with Enterprise-scale custo...
devopsdays Stockholm Ignite talk: Aligning DevOps with Enterprise-scale custo...devopsdays Stockholm Ignite talk: Aligning DevOps with Enterprise-scale custo...
devopsdays Stockholm Ignite talk: Aligning DevOps with Enterprise-scale custo...
Jon Stevens-Hall
 
Knowledge Management in BMC Remedy 9.1
Knowledge Management in BMC Remedy 9.1Knowledge Management in BMC Remedy 9.1
Knowledge Management in BMC Remedy 9.1
Jon Stevens-Hall
 
How the Internet of Things and 20 billion devices will change your job
How the Internet of Things and 20 billion devices will change your jobHow the Internet of Things and 20 billion devices will change your job
How the Internet of Things and 20 billion devices will change your job
Jon Stevens-Hall
 
IAITAM ACE 2016, New Orleans - Presentation
IAITAM ACE 2016, New Orleans - PresentationIAITAM ACE 2016, New Orleans - Presentation
IAITAM ACE 2016, New Orleans - Presentation
Jon Stevens-Hall
 
Evolving Service for the Digital Workplace
Evolving Service for the Digital WorkplaceEvolving Service for the Digital Workplace
Evolving Service for the Digital Workplace
Jon Stevens-Hall
 
Optimizing Service Desk Interactions with Knowledge Management - BMC Engage 2015
Optimizing Service Desk Interactions with Knowledge Management - BMC Engage 2015Optimizing Service Desk Interactions with Knowledge Management - BMC Engage 2015
Optimizing Service Desk Interactions with Knowledge Management - BMC Engage 2015
Jon Stevens-Hall
 
BMC Engage 2015: IT Asset Management - An essential pillar for the digital en...
BMC Engage 2015: IT Asset Management - An essential pillar for the digital en...BMC Engage 2015: IT Asset Management - An essential pillar for the digital en...
BMC Engage 2015: IT Asset Management - An essential pillar for the digital en...
Jon Stevens-Hall
 
BMC Engage 2015: Smart IT, MyIT and the Power of the Service Platform
BMC Engage 2015: Smart IT, MyIT and the Power of the Service PlatformBMC Engage 2015: Smart IT, MyIT and the Power of the Service Platform
BMC Engage 2015: Smart IT, MyIT and the Power of the Service Platform
Jon Stevens-Hall
 
IT Trends Set to Shape Software Asset Management (IBSMA SAM Summit June 2015)
IT Trends Set to Shape Software Asset Management (IBSMA SAM Summit June 2015)IT Trends Set to Shape Software Asset Management (IBSMA SAM Summit June 2015)
IT Trends Set to Shape Software Asset Management (IBSMA SAM Summit June 2015)
Jon Stevens-Hall
 
Bridging the Gap - The Value of Integrated Asset and Service Management
Bridging the Gap - The Value of Integrated Asset and Service ManagementBridging the Gap - The Value of Integrated Asset and Service Management
Bridging the Gap - The Value of Integrated Asset and Service Management
Jon Stevens-Hall
 
BMC Engage - ITAM 2015-2020: The Evolving Role of the IT Asset Manager
BMC Engage - ITAM 2015-2020: The Evolving Role of the IT Asset ManagerBMC Engage - ITAM 2015-2020: The Evolving Role of the IT Asset Manager
BMC Engage - ITAM 2015-2020: The Evolving Role of the IT Asset Manager
Jon Stevens-Hall
 
Bridging the Gap - the Value of Integrated Asset and Service Management
Bridging the Gap - the Value of Integrated Asset and Service ManagementBridging the Gap - the Value of Integrated Asset and Service Management
Bridging the Gap - the Value of Integrated Asset and Service Management
Jon Stevens-Hall
 

More from Jon Stevens-Hall (15)

DevOps Enterprise Summit Las Vegas 2018: The Problem of Becoming a 3rd-Line S...
DevOps Enterprise Summit Las Vegas 2018: The Problem of Becoming a 3rd-Line S...DevOps Enterprise Summit Las Vegas 2018: The Problem of Becoming a 3rd-Line S...
DevOps Enterprise Summit Las Vegas 2018: The Problem of Becoming a 3rd-Line S...
 
Service Manager Dag, Netherlands 2018: Why we should ditch the 3-tier support...
Service Manager Dag, Netherlands 2018: Why we should ditch the 3-tier support...Service Manager Dag, Netherlands 2018: Why we should ditch the 3-tier support...
Service Manager Dag, Netherlands 2018: Why we should ditch the 3-tier support...
 
Configuration Management Camp 2018: The problem of becoming "3rd line support...
Configuration Management Camp 2018: The problem of becoming "3rd line support...Configuration Management Camp 2018: The problem of becoming "3rd line support...
Configuration Management Camp 2018: The problem of becoming "3rd line support...
 
devopsdays Stockholm Ignite talk: Aligning DevOps with Enterprise-scale custo...
devopsdays Stockholm Ignite talk: Aligning DevOps with Enterprise-scale custo...devopsdays Stockholm Ignite talk: Aligning DevOps with Enterprise-scale custo...
devopsdays Stockholm Ignite talk: Aligning DevOps with Enterprise-scale custo...
 
Knowledge Management in BMC Remedy 9.1
Knowledge Management in BMC Remedy 9.1Knowledge Management in BMC Remedy 9.1
Knowledge Management in BMC Remedy 9.1
 
How the Internet of Things and 20 billion devices will change your job
How the Internet of Things and 20 billion devices will change your jobHow the Internet of Things and 20 billion devices will change your job
How the Internet of Things and 20 billion devices will change your job
 
IAITAM ACE 2016, New Orleans - Presentation
IAITAM ACE 2016, New Orleans - PresentationIAITAM ACE 2016, New Orleans - Presentation
IAITAM ACE 2016, New Orleans - Presentation
 
Evolving Service for the Digital Workplace
Evolving Service for the Digital WorkplaceEvolving Service for the Digital Workplace
Evolving Service for the Digital Workplace
 
Optimizing Service Desk Interactions with Knowledge Management - BMC Engage 2015
Optimizing Service Desk Interactions with Knowledge Management - BMC Engage 2015Optimizing Service Desk Interactions with Knowledge Management - BMC Engage 2015
Optimizing Service Desk Interactions with Knowledge Management - BMC Engage 2015
 
BMC Engage 2015: IT Asset Management - An essential pillar for the digital en...
BMC Engage 2015: IT Asset Management - An essential pillar for the digital en...BMC Engage 2015: IT Asset Management - An essential pillar for the digital en...
BMC Engage 2015: IT Asset Management - An essential pillar for the digital en...
 
BMC Engage 2015: Smart IT, MyIT and the Power of the Service Platform
BMC Engage 2015: Smart IT, MyIT and the Power of the Service PlatformBMC Engage 2015: Smart IT, MyIT and the Power of the Service Platform
BMC Engage 2015: Smart IT, MyIT and the Power of the Service Platform
 
IT Trends Set to Shape Software Asset Management (IBSMA SAM Summit June 2015)
IT Trends Set to Shape Software Asset Management (IBSMA SAM Summit June 2015)IT Trends Set to Shape Software Asset Management (IBSMA SAM Summit June 2015)
IT Trends Set to Shape Software Asset Management (IBSMA SAM Summit June 2015)
 
Bridging the Gap - The Value of Integrated Asset and Service Management
Bridging the Gap - The Value of Integrated Asset and Service ManagementBridging the Gap - The Value of Integrated Asset and Service Management
Bridging the Gap - The Value of Integrated Asset and Service Management
 
BMC Engage - ITAM 2015-2020: The Evolving Role of the IT Asset Manager
BMC Engage - ITAM 2015-2020: The Evolving Role of the IT Asset ManagerBMC Engage - ITAM 2015-2020: The Evolving Role of the IT Asset Manager
BMC Engage - ITAM 2015-2020: The Evolving Role of the IT Asset Manager
 
Bridging the Gap - the Value of Integrated Asset and Service Management
Bridging the Gap - the Value of Integrated Asset and Service ManagementBridging the Gap - the Value of Integrated Asset and Service Management
Bridging the Gap - the Value of Integrated Asset and Service Management
 

Recently uploaded

Things to Consider When Choosing a Website Developer for your Website | FODUU
Things to Consider When Choosing a Website Developer for your Website | FODUUThings to Consider When Choosing a Website Developer for your Website | FODUU
Things to Consider When Choosing a Website Developer for your Website | FODUU
FODUU
 
CAKE: Sharing Slices of Confidential Data on Blockchain
CAKE: Sharing Slices of Confidential Data on BlockchainCAKE: Sharing Slices of Confidential Data on Blockchain
CAKE: Sharing Slices of Confidential Data on Blockchain
Claudio Di Ciccio
 
Infrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI modelsInfrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI models
Zilliz
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
Pixlogix Infotech
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Tosin Akinosho
 
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceAI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
IndexBug
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
Zilliz
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
shyamraj55
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Safe Software
 
OpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - AuthorizationOpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - Authorization
David Brossard
 
AI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdf
AI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdfAI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdf
AI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdf
Techgropse Pvt.Ltd.
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
tolgahangng
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
Mariano Tinti
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
akankshawande
 
UI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentationUI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentation
Wouter Lemaire
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxOcean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
SitimaJohn
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
Brandon Minnick, MBA
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
名前 です男
 

Recently uploaded (20)

Things to Consider When Choosing a Website Developer for your Website | FODUU
Things to Consider When Choosing a Website Developer for your Website | FODUUThings to Consider When Choosing a Website Developer for your Website | FODUU
Things to Consider When Choosing a Website Developer for your Website | FODUU
 
CAKE: Sharing Slices of Confidential Data on Blockchain
CAKE: Sharing Slices of Confidential Data on BlockchainCAKE: Sharing Slices of Confidential Data on Blockchain
CAKE: Sharing Slices of Confidential Data on Blockchain
 
Infrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI modelsInfrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI models
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
 
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceAI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
 
OpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - AuthorizationOpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - Authorization
 
AI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdf
AI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdfAI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdf
AI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdf
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
 
UI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentationUI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentation
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxOcean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
 

Site Reliability Engineering: Harnessing (and redefining) it for ITSM

  • 1. Site Reliability Engineering: Harnessing (and redefining) it for ITSM Principal Product Manager (BMC Software) and ITIL® 4 Contributing Author “Service Management Post Corona” virtual event: 10th June 2020 Jon Stevens-Hall
  • 2. A brief introduction to SRE • First published 2016 • Free online: https://landing.google.com/sre/books/ “SRE is what happens when you ask a software engineer to design an operations team” • Some SREs at Google are software engineers hired in the standard way • Others were close to being recruited as a software engineer, and brought other infra skills • SREs help software developers run their services but also expect accountability from them. @jonstevenshall
  • 3. Me • ITSM veteran – 23 years • Principal Product Manager with BMC software, for our enterprise ITSM toolset, BMC Helix ITSM • ITIL® 4 contributing author • Strong focus on DevOps and new ways of working • Asked by SDI to talk about SRE today Twitter: @JonStevensHall • https://www.bmc.com/blogs/author/jonhall/ • https://medium.com/@jonhall_ @jonstevenshall
  • 4. Eliminating Toil Subjects covered in the SRE handbook Risk Service Level Objectives Monitoring Distributed Systems Automation Release Engineering Alerting On-Call Handling Incidents Postmortem SchedulingHuman Welfare Communication Risk Service Level Objectives Eliminating Toil @jonstevenshall
  • 5. Defining Toil AutomatableRepetitiveManual Tactical Devoid of enduring value Linearly scaling • SRE teams can only spend a maximum of 50% on work considered to be “Toil”. • Google’s SRE handbook describes “toil” as work which is: @jonstevenshall
  • 6. Service Level Objective Availability = Successful requests Total requests Availability = Uptime Uptime + Downtime SLOs are not SLAs. Not all Google services have SLAs, but some do. SLOs define appropriate technical bounds for service performance. “99.9% of RPC calls will complete in less than 100 ms" “The service will be available 99.99% of the time” @jonstevenshall
  • 7. Error Budget Defines how unreliable the service is allowed to be, within a given time period. 100% Required performance (SLO target) Remaining error budget Outages to-date
  • 8. “An outage is no longer a bad thing. It is an expected part of the process of innovation, and an occurrence that development and SRE teams manage rather than fear” Outages are okay, within error budget tolerances @jonstevenshall
  • 9. Error Budget 99.9% SLO 0.05% outages already Error budget remaining is 0.05% Hence an additional outage which consumes 0.02% of the quarter’s availability would cost 40% of the remaining error budget.
  • 10. Balancing innovation and reliability • Large budget: developers can take more risks • Budget nearly drained: developers tend to slow down • Budget used up: releases temporarily halted Error Budget shapes ways of working Martin Vorel via Libreshot @jonstevenshall
  • 11. How does SRE align to ITSM as we know it? “Site Reliability Engineering (SRE) is an emerging IT service management (ITSM) framework” (First line of the report, published 1st April 2020) @jonstevenshall
  • 12. Will SRE replace ITSM as we know it? “As a service management alternative, SRE updates traditional ITSM activities with innovative and self- organizing concepts such as… • Management to service level objectives • Error budgets • Toil reduction • Release engineering • Monitoring/observability • Embracing risk” Jayne Groll (@JayneGroll), DevOps.com, 2020 @jonstevenshall
  • 13. A counterpoint: Most of us are not Google. “If you think spinning up an ops team and calling it SRE is ‘an implementation of DevOps’ you’ve swallowed the worst poison pill the DevOps talk circuit can deal to you.” • Unless you are Google scale, a separate production operations team is inefficient • DevOps showed that “throw over the wall from Dev to Ops” was wrong • Throwing over the wall from Dev to SRE doesn’t improve that The Agile Admin – Blog (2018). @jonstevenshall
  • 14. One size doesn’t fit all “Toyota opens its factory doors to us again and again, but I imagine Toyota’s leaders may also be shaking their heads and thinking, ‘Sure, come have a look. But why are you so interested in the solutions we develop for our specific problems? Why do you never study how we go about developing those solutions?’” Mike Rother – Toyota Kata (2009) @jonstevenshall
  • 15. • “Site Reliability Engineering” • In most cases, IT services aren’t “sites” • A lot of ITSM work isn’t software engineering • Reliability is a broad term • Also, it’s not easy to apply directly to commercial off-the-shelf software (especially SaaS) which underpins many IT services. © Copyright 2020 BMC Software, Inc. What’s in a name? Rethinking SRE for ITSM
  • 16. SRE is already an established profession Job advert for ”disruptive Fintech”, retrieved May 2020 @jonstevenshall
  • 17. ITIL® 4 maps SRE principles to 14 practices • Availability management • Capacity and performance management • Change enablement • Incident management • Infrastructure and platform management • Monitoring and event management • Problem management • Service design • Software development and management • Deployment management • Organizational change management • Release management • Service configuration management • Service validation and testing (but at a very high level) @jonstevenshall
  • 18. • Current service desk performance: 98% call pickup, on a target of 90% • Agents fully utilised on calls Example of an “SRE-like” change for IT support Before: • Schedule a proportion of each agent’s time to be away from calls • Use that time for proactive reduction of toil (e.g. building self service offerings, developing knowledge content) • The previous 8% over-performance in call pickup is spent as “error budget” Applying SRE thinking: @jonstevenshall
  • 19. • Change approval interventions determined by perceived risk of change type • Pressure from DevOps to remove approval barriers (particularly CAB) SRE applied to Change Enablement Before: • Dynamically assess approval level based on confidence level in team • Implementation success rate of previous changes • Issues caused by previous changes • Adherence to guardrails in the DevOps toolchain • Minimal intervention, applied more smartly Applying SRE thinking: @jonstevenshall
  • 20. © Copyright 2020 BMC Software, Inc. Example of Change Enablement with SRE principles Example: • Two development teams with different track record (Dev Group 2 – higher initial confidence than Dev Group 1) • Dynamically adjust to changing success rates (Watch what happens after a Dev Group 2 failure)
  • 21. • Site Reliability Engineering is a bad name for many aspects of ITSM. • SRE is already defined (and recruited) as a specialist operations developer role. • Some key, novel themes have great potential but need redefinition for much of ITSM. Summary of issues @jonstevenshall
  • 22. • “Rebrand” the SRE concept for ITSM: New name, new thinking. • Identify new ways to apply key novel principles such as reliability and toil reduction. • Invest in ITSM initiatives such as KCS which share good principles with SRE. Proposal @jonstevenshall
  • 23. © Copyright 2020 BMC Software, Inc. Referenced materials and further reading https://theagileadmin.com/2018/10/02/sre- the-biggest-lie-since-kanban/ https://devops.com/sre-is-the-most- innovative-approach-to-itsm-since-itil/ @ernestmueller @JayneGroll @RealMikeRother http://www-personal.umich.edu/ ~mrother/Homepage.html https://www.bmc.com/blogs /sre-vs-itops/ @DevDroid9 https://medium.com/@JonHall_/what-can-site-reliability- engineers-teach-service-desks-about-toil-e1df38ca284e @jonstevenshall https://landing.google.com/sre/books/ @srebook