SlideShare a Scribd company logo
1 of 37
Download to read offline
SysAdmin to SRE:
Creating Capacity to Make Tomorrow Better Than Today
How Runbook Automation for Incident Management, and Other Self-Service Operations Practices
Can Ignite the way to True SRE Outcomes
jorn knuttila
@jorn_knuttila
Not that far away, maybe in a company just like yours…
🔥
Overloaded. Constant firefighting.
Ticket
Ticket
Project A
···
Project B
···
Ticket
Ticket
Ticket
Ticket
Ticket
Ticket
Ticket
Ticket
Ticket
Ticket
Ticket
DUE: Yesterday! DUE: Tomorrow!
Ticket
Ticket
Ticket
🔥
Waiting in ticket queues for everything.
Not that far away, maybe in a company just like yours…
Ticket
Ticket
Ticket
Ticket
Ticket
Ticket
🧨
Things break. Break again. And again.
Not that far away, maybe in a company just like yours…
Later…
Later…
same
same
Help!
Ticket
Wait Interrupt
Help!
Ticket
Wait Interrupt
Help!
Ticket
Wait Interrupt
⁉
Everyone is busy, but it doesn’t get any better.
Not that far away, maybe in a company just like yours…
Improvement
Project
Business
Delivery
Incidents
Business
Delivery
Business
Delivery
🔥
Overloaded. Constant firefighting.
🔥
Waiting in ticket queues for everything.
🔥
Things break. Break again. And again.
🔥
Everyone is busy, but it doesn’t get any better.
Not that far away, maybe in a company just like yours…
Everything takes too long, costs
too much, and breaks too often!
Executives
SRE
Have you heard of SRE?
Google does it.
Jane Doe
Systems Administrator
We have
SysAdmins
Jane Doe
SRE
We have
SysAdmins
They should be
SREs!
SysAdmins
Overloaded. Constant
firefighting.
Waiting in ticket queues
for everything.
Things break. Break again.
And again.
Everyone is busy, but it
doesn’t get any better.
Everything takes too
long, cost too much, and
break too often!
Executive
View
SRE (new name)
Overloaded. Constant
firefighting.
Waiting in ticket queues
for everything.
Things break. Break again.
And again.
Everyone is busy, but it
doesn’t get any better.
Everything takes too
long, cost too much, and
break too often!
Executive
View
Changing job titles or adding individual skills
doesn’t make systems administrators SREs.
Observability
Programming
Skills
Distributed
Systems Arch.
Blameless
Post-Mortems
000000000000000
Observability
Programming
Skills
Distributed
Systems Arch.
Blameless
Post-Mortems
000000000000000
Not SRE
Changing job titles or adding individual skills
doesn’t make systems administrators SREs.
Changing job titles or adding individual skills
doesn’t make systems administrators SREs.
SRE is a rethinking of how Operations work gets
done.
Principles are what makes SRE different
1. SRE needs Service Level Objectives, with consequences
Stephen Thorne, Google
At DevOps Enterprise Summit
London 2018
“Principles of
SRE”
https://youtu.be/c-w_GYvi0eA
SLO and Error Budgets: Tools for Shared Responsibility
0
100
Service Level Objective
Error Budget*
Service Level Indicator
(*Use this to improve the service)
SLO and Error Budgets: Tools for Shared Responsibility
DEV
BIZ
Ops
SLO takes priority!!
0
100
Service Level Objective
Error Budget*
Service Level Indicator
(*Use this to improve the service)
Principles of SRE are what set SRE apart
1. SRE needs Service Level Objectives, with consequences
2. SREs have time to make tomorrow better than today
Toil: Name For a Problem We’ve All Felt
“Toil is the kind of work tied to running a production
service that tends to be manual, repetitive,
automatable, tactical, devoid of enduring value, and
that scales linearly as a service grows.” -Vivek Rau (Google)
Toil vs. Engineering Work
Toil Engineering Work
Lacks Enduring Value Builds Enduring Value
Rote, Repetitive Creative, Iterative
Tactical Strategic
Increases With Scale Enables Scaling
Can Be Automated Requires Human Creativity
Toil Engineering Work
E.W.Toil
Reduce toil
Improve the business ǡ
No capacity to reduce toil
No capacity to improve business
Toil at manageable percentage of capacity
Toil at unmanageable percentage of capacity (“Engineering Bankruptcy”)
Excessive Toil Prevents Fixing the System
Downward spiral is inevitable!
Principles of SRE are what set SRE apart
1. SRE needs Service Level Objectives, with consequences
2. SREs have time to make tomorrow better than today
3. SRE teams have the ability to regulate their workload
SRE teams have the ability to regulate their workload
What if handing-off responsibility to SRE/Ops wasn’t a right?
(separate the “running in production” from “run by SRE/Ops”)
“?!?”
Where to start (the practical approach)
1. SRE needs Service Level Objectives, with consequences
2. SREs have time to make tomorrow better than today
3. SRE teams have the ability to regulate their workload
Company-wide culture change (hard!)
Company-wide culture change (hard!)
Reduce toil.
Everybody wins!
2. Your people are you most expensive assets
… stay out of their way!
Why focus on reducing toil?
1. Lots of value independent of “SRE”
Super easy to get started reducing toil
1. Track toil levels for each team
2. Set toil limit for each team (50% is conventional wisdom)
3. Fund efforts to reduce toil (with emphasis on teams already over limit)
Toil
↳ Refactor apps, tools, and processes
↳ Apply self-service design pattern
How to enable self-service?
Empower teams to spot and fix the anti-patterns.
“Do this for me, do it again, then do it again.”
Toil Toil
“I could fix it, but I can’t get to it.”
Toil Toil
“The dog-pile.”
Toil Toil
What do people read even less than
documentation?
“I’m an expert, I don’t read the wiki.”
Toil Toil
“Dev work is more expensive than Ops work”
Toil
Toil
Self-Service AutomationSelf-Service Automation
for
all
your
Self-Service Operations Design Pattern (in a nutshell)
Pull-Based
Accept tools/languages
that teams want to use
Let people who
“push buttons”
define the buttons
Build in security
and compliance
Define “guardrails” to
provide work safety
Recap: Creating Capacity to Make Tomorrow Better Than Today
SRE is more than a title
Be practical and start focusing
on toil
Find and fix toil anti-patterns
Error Budgets and Toil Limits
Apply Self-Service Operations
design pattern
SRE is a new way to think
about Ops work
1. SRE needs Service Level
Objectives, with consequences
2. SREs have time to make
tomorrow better than today
3. SRE teams have the ability to
regulate their workload
Toil
Let’s talk…
@jorn_knuttila
📧 jorn@rundeck.rocks

More Related Content

Similar to 2019-11 NewOpsDays Dallas - Sysadmin to SRE _v1.1

It's a startup life: from idea to execution.
It's a startup life: from idea to execution.It's a startup life: from idea to execution.
It's a startup life: from idea to execution.Miet Claes
 
What needs to be true? Patterns of engineering agility
What needs to be true? Patterns of engineering agilityWhat needs to be true? Patterns of engineering agility
What needs to be true? Patterns of engineering agilityAndy Norton
 
The History of DevOps (and what you need to do about it)
The History of DevOps (and what you need to do about it)The History of DevOps (and what you need to do about it)
The History of DevOps (and what you need to do about it)dev2ops
 
Who owns the AV department - Creating an AV Master Plan
Who owns the AV department - Creating an AV Master PlanWho owns the AV department - Creating an AV Master Plan
Who owns the AV department - Creating an AV Master PlanHarald Steindl
 
Who owns the AV department - Creating an AV Master Plan
Who owns the AV department - Creating an AV Master PlanWho owns the AV department - Creating an AV Master Plan
Who owns the AV department - Creating an AV Master PlanHarald Unger
 
DevOps Paradox: Going Faster Brings Higher Quality, Lower Costs, & Better Out...
DevOps Paradox: Going Faster Brings Higher Quality, Lower Costs, & Better Out...DevOps Paradox: Going Faster Brings Higher Quality, Lower Costs, & Better Out...
DevOps Paradox: Going Faster Brings Higher Quality, Lower Costs, & Better Out...dev2ops
 
If you don't know where you're going it doesn't matter how fast you get there
If you don't know where you're going it doesn't matter how fast you get thereIf you don't know where you're going it doesn't matter how fast you get there
If you don't know where you're going it doesn't matter how fast you get thereNicole Forsgren
 
It's Okay to be Wrong (Accelerator Academy Oct '17)
It's Okay to be Wrong (Accelerator Academy Oct '17)It's Okay to be Wrong (Accelerator Academy Oct '17)
It's Okay to be Wrong (Accelerator Academy Oct '17)Matt Mower
 
Devops at scale is a hard problem challenges, insights and lessons learned
Devops at scale is a hard problem  challenges, insights and lessons learnedDevops at scale is a hard problem  challenges, insights and lessons learned
Devops at scale is a hard problem challenges, insights and lessons learnedkjalleda
 
Its not about the tooling
Its not about the toolingIts not about the tooling
Its not about the toolingBram Vogelaar
 
Master Technical Recruiting Workshop: How to Recruit Top Tech Talent
Master Technical Recruiting Workshop:  How to Recruit Top Tech TalentMaster Technical Recruiting Workshop:  How to Recruit Top Tech Talent
Master Technical Recruiting Workshop: How to Recruit Top Tech TalentRecruitingDaily.com LLC
 
A Self Funding Agile Transformation
A Self Funding Agile TransformationA Self Funding Agile Transformation
A Self Funding Agile TransformationDaniel Poon
 
Chasingwindmills agile success
Chasingwindmills agile successChasingwindmills agile success
Chasingwindmills agile successPaul Boos
 
Agile Practice in a DevOps World
Agile Practice in a DevOps WorldAgile Practice in a DevOps World
Agile Practice in a DevOps WorldMagnus Hedemark
 
Monitoring Complex Systems - Chicago Erlang, 2014
Monitoring Complex Systems - Chicago Erlang, 2014Monitoring Complex Systems - Chicago Erlang, 2014
Monitoring Complex Systems - Chicago Erlang, 2014Brian Troutwine
 
Let's bring the teams back together
Let's bring the teams back togetherLet's bring the teams back together
Let's bring the teams back togetherKris Buytaert
 
Wait A Moment? How High Workload Kills Efficiency! - Roman Pickl
Wait A Moment? How High Workload Kills Efficiency! - Roman PicklWait A Moment? How High Workload Kills Efficiency! - Roman Pickl
Wait A Moment? How High Workload Kills Efficiency! - Roman PicklPROIDEA
 
Devops is not about Tooling
Devops is not about ToolingDevops is not about Tooling
Devops is not about ToolingKris Buytaert
 

Similar to 2019-11 NewOpsDays Dallas - Sysadmin to SRE _v1.1 (20)

Agile Coach Retreat - Montreal - Sep-2013
Agile Coach Retreat - Montreal - Sep-2013Agile Coach Retreat - Montreal - Sep-2013
Agile Coach Retreat - Montreal - Sep-2013
 
It's a startup life: from idea to execution.
It's a startup life: from idea to execution.It's a startup life: from idea to execution.
It's a startup life: from idea to execution.
 
What needs to be true? Patterns of engineering agility
What needs to be true? Patterns of engineering agilityWhat needs to be true? Patterns of engineering agility
What needs to be true? Patterns of engineering agility
 
The History of DevOps (and what you need to do about it)
The History of DevOps (and what you need to do about it)The History of DevOps (and what you need to do about it)
The History of DevOps (and what you need to do about it)
 
Who owns the AV department - Creating an AV Master Plan
Who owns the AV department - Creating an AV Master PlanWho owns the AV department - Creating an AV Master Plan
Who owns the AV department - Creating an AV Master Plan
 
Who owns the AV department - Creating an AV Master Plan
Who owns the AV department - Creating an AV Master PlanWho owns the AV department - Creating an AV Master Plan
Who owns the AV department - Creating an AV Master Plan
 
DevOps Paradox: Going Faster Brings Higher Quality, Lower Costs, & Better Out...
DevOps Paradox: Going Faster Brings Higher Quality, Lower Costs, & Better Out...DevOps Paradox: Going Faster Brings Higher Quality, Lower Costs, & Better Out...
DevOps Paradox: Going Faster Brings Higher Quality, Lower Costs, & Better Out...
 
If you don't know where you're going it doesn't matter how fast you get there
If you don't know where you're going it doesn't matter how fast you get thereIf you don't know where you're going it doesn't matter how fast you get there
If you don't know where you're going it doesn't matter how fast you get there
 
It's Okay to be Wrong (Accelerator Academy Oct '17)
It's Okay to be Wrong (Accelerator Academy Oct '17)It's Okay to be Wrong (Accelerator Academy Oct '17)
It's Okay to be Wrong (Accelerator Academy Oct '17)
 
Devops at scale is a hard problem challenges, insights and lessons learned
Devops at scale is a hard problem  challenges, insights and lessons learnedDevops at scale is a hard problem  challenges, insights and lessons learned
Devops at scale is a hard problem challenges, insights and lessons learned
 
Its not about the tooling
Its not about the toolingIts not about the tooling
Its not about the tooling
 
Master Technical Recruiting Workshop: How to Recruit Top Tech Talent
Master Technical Recruiting Workshop:  How to Recruit Top Tech TalentMaster Technical Recruiting Workshop:  How to Recruit Top Tech Talent
Master Technical Recruiting Workshop: How to Recruit Top Tech Talent
 
A Self Funding Agile Transformation
A Self Funding Agile TransformationA Self Funding Agile Transformation
A Self Funding Agile Transformation
 
The Future of Work
The Future of WorkThe Future of Work
The Future of Work
 
Chasingwindmills agile success
Chasingwindmills agile successChasingwindmills agile success
Chasingwindmills agile success
 
Agile Practice in a DevOps World
Agile Practice in a DevOps WorldAgile Practice in a DevOps World
Agile Practice in a DevOps World
 
Monitoring Complex Systems - Chicago Erlang, 2014
Monitoring Complex Systems - Chicago Erlang, 2014Monitoring Complex Systems - Chicago Erlang, 2014
Monitoring Complex Systems - Chicago Erlang, 2014
 
Let's bring the teams back together
Let's bring the teams back togetherLet's bring the teams back together
Let's bring the teams back together
 
Wait A Moment? How High Workload Kills Efficiency! - Roman Pickl
Wait A Moment? How High Workload Kills Efficiency! - Roman PicklWait A Moment? How High Workload Kills Efficiency! - Roman Pickl
Wait A Moment? How High Workload Kills Efficiency! - Roman Pickl
 
Devops is not about Tooling
Devops is not about ToolingDevops is not about Tooling
Devops is not about Tooling
 

Recently uploaded

CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 

Recently uploaded (20)

CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 

2019-11 NewOpsDays Dallas - Sysadmin to SRE _v1.1

  • 1. SysAdmin to SRE: Creating Capacity to Make Tomorrow Better Than Today How Runbook Automation for Incident Management, and Other Self-Service Operations Practices Can Ignite the way to True SRE Outcomes jorn knuttila @jorn_knuttila
  • 2. Not that far away, maybe in a company just like yours… 🔥 Overloaded. Constant firefighting. Ticket Ticket Project A ··· Project B ··· Ticket Ticket Ticket Ticket Ticket Ticket Ticket Ticket Ticket Ticket Ticket DUE: Yesterday! DUE: Tomorrow! Ticket Ticket Ticket
  • 3. 🔥 Waiting in ticket queues for everything. Not that far away, maybe in a company just like yours… Ticket Ticket Ticket Ticket Ticket Ticket
  • 4. 🧨 Things break. Break again. And again. Not that far away, maybe in a company just like yours… Later… Later… same same Help! Ticket Wait Interrupt Help! Ticket Wait Interrupt Help! Ticket Wait Interrupt
  • 5. ⁉ Everyone is busy, but it doesn’t get any better. Not that far away, maybe in a company just like yours… Improvement Project Business Delivery Incidents Business Delivery Business Delivery
  • 6. 🔥 Overloaded. Constant firefighting. 🔥 Waiting in ticket queues for everything. 🔥 Things break. Break again. And again. 🔥 Everyone is busy, but it doesn’t get any better. Not that far away, maybe in a company just like yours… Everything takes too long, costs too much, and breaks too often! Executives
  • 7. SRE
  • 8. Have you heard of SRE? Google does it.
  • 11. SysAdmins Overloaded. Constant firefighting. Waiting in ticket queues for everything. Things break. Break again. And again. Everyone is busy, but it doesn’t get any better. Everything takes too long, cost too much, and break too often! Executive View SRE (new name) Overloaded. Constant firefighting. Waiting in ticket queues for everything. Things break. Break again. And again. Everyone is busy, but it doesn’t get any better. Everything takes too long, cost too much, and break too often! Executive View
  • 12. Changing job titles or adding individual skills doesn’t make systems administrators SREs. Observability Programming Skills Distributed Systems Arch. Blameless Post-Mortems 000000000000000
  • 13. Observability Programming Skills Distributed Systems Arch. Blameless Post-Mortems 000000000000000 Not SRE Changing job titles or adding individual skills doesn’t make systems administrators SREs.
  • 14. Changing job titles or adding individual skills doesn’t make systems administrators SREs. SRE is a rethinking of how Operations work gets done.
  • 15. Principles are what makes SRE different 1. SRE needs Service Level Objectives, with consequences Stephen Thorne, Google At DevOps Enterprise Summit London 2018 “Principles of SRE” https://youtu.be/c-w_GYvi0eA
  • 16. SLO and Error Budgets: Tools for Shared Responsibility 0 100 Service Level Objective Error Budget* Service Level Indicator (*Use this to improve the service)
  • 17. SLO and Error Budgets: Tools for Shared Responsibility DEV BIZ Ops SLO takes priority!! 0 100 Service Level Objective Error Budget* Service Level Indicator (*Use this to improve the service)
  • 18. Principles of SRE are what set SRE apart 1. SRE needs Service Level Objectives, with consequences 2. SREs have time to make tomorrow better than today
  • 19. Toil: Name For a Problem We’ve All Felt “Toil is the kind of work tied to running a production service that tends to be manual, repetitive, automatable, tactical, devoid of enduring value, and that scales linearly as a service grows.” -Vivek Rau (Google)
  • 20. Toil vs. Engineering Work Toil Engineering Work Lacks Enduring Value Builds Enduring Value Rote, Repetitive Creative, Iterative Tactical Strategic Increases With Scale Enables Scaling Can Be Automated Requires Human Creativity
  • 21. Toil Engineering Work E.W.Toil Reduce toil Improve the business ǡ No capacity to reduce toil No capacity to improve business Toil at manageable percentage of capacity Toil at unmanageable percentage of capacity (“Engineering Bankruptcy”) Excessive Toil Prevents Fixing the System Downward spiral is inevitable!
  • 22. Principles of SRE are what set SRE apart 1. SRE needs Service Level Objectives, with consequences 2. SREs have time to make tomorrow better than today 3. SRE teams have the ability to regulate their workload
  • 23. SRE teams have the ability to regulate their workload What if handing-off responsibility to SRE/Ops wasn’t a right? (separate the “running in production” from “run by SRE/Ops”) “?!?”
  • 24. Where to start (the practical approach) 1. SRE needs Service Level Objectives, with consequences 2. SREs have time to make tomorrow better than today 3. SRE teams have the ability to regulate their workload Company-wide culture change (hard!) Company-wide culture change (hard!) Reduce toil. Everybody wins!
  • 25. 2. Your people are you most expensive assets … stay out of their way! Why focus on reducing toil? 1. Lots of value independent of “SRE”
  • 26. Super easy to get started reducing toil 1. Track toil levels for each team 2. Set toil limit for each team (50% is conventional wisdom) 3. Fund efforts to reduce toil (with emphasis on teams already over limit) Toil ↳ Refactor apps, tools, and processes ↳ Apply self-service design pattern
  • 27. How to enable self-service? Empower teams to spot and fix the anti-patterns.
  • 28. “Do this for me, do it again, then do it again.” Toil Toil
  • 29. “I could fix it, but I can’t get to it.” Toil Toil
  • 31. What do people read even less than documentation?
  • 32. “I’m an expert, I don’t read the wiki.” Toil Toil
  • 33. “Dev work is more expensive than Ops work” Toil Toil
  • 35. Self-Service Operations Design Pattern (in a nutshell) Pull-Based Accept tools/languages that teams want to use Let people who “push buttons” define the buttons Build in security and compliance Define “guardrails” to provide work safety
  • 36. Recap: Creating Capacity to Make Tomorrow Better Than Today SRE is more than a title Be practical and start focusing on toil Find and fix toil anti-patterns Error Budgets and Toil Limits Apply Self-Service Operations design pattern SRE is a new way to think about Ops work 1. SRE needs Service Level Objectives, with consequences 2. SREs have time to make tomorrow better than today 3. SRE teams have the ability to regulate their workload Toil