SlideShare a Scribd company logo
1 of 113
Download to read offline
SRE for Everyone:
Making Tomorrow Better Than Today

Damon Edwards

@damonedwards
2019
Not that far away, maybe in a company just like yours…
Not that far away, maybe in a company just like yours…
Overloaded. Constant firefighting.
Ticket
Ticket
Project A
···
Project B
···
Ticket
Ticket
Ticket
Ticket
Ticket
Ticket
Ticket
Ticket
Ticket
Ticket
Ticket
DUE: Yesterday! DUE: Tomorrow!
Ticket
Ticket
Ticket
Waiting in ticket queues for everything.
Not that far away, maybe in a company just like yours…
Waiting in ticket queues for everything.
Ticket
Not that far away, maybe in a company just like yours…
Waiting in ticket queues for everything.
Ticket
Ticket
Ticket
Ticket
Ticket
Ticket
Not that far away, maybe in a company just like yours…
Things break. Break again. And again.
Later…
Later…
same
same
Help!
Ticket
Wait Interrupt
Help!
Ticket
Wait Interrupt
Help!
Ticket
Wait Interrupt
Not that far away, maybe in a company just like yours…
Everyone is busy, but it doesn’t get any better.
Improvement
Project
Business
Delivery
Incidents
Business
Delivery
Business
Delivery
Not that far away, maybe in a company just like yours…
Overloaded. Constant firefighting.
Waiting in ticket queues for everything.
Things break. Break again. And again.
Everyone is busy, but it doesn’t get any better.
Not that far away, maybe in a company just like yours…
Overloaded. Constant firefighting.
Waiting in ticket queues for everything.
Things break. Break again. And again.
Everyone is busy, but it doesn’t get any better.
Not that far away, maybe in a company just like yours…
Everything takes too long, costs
too much, and breaks too often!
Executives

Have you heard of SRE?
Google does it.
“SRE…
When you ask
software engineers
to do operations”
“SRE…
Next-generation,
cloud-native
Operations”
Class SRE implements DevOps
“SRE…
When Ops does
more engineering
than Ops”
“SRE…
When you ask
software engineers
to do operations”
“SRE…
Next-generation,
cloud-native
Operations”
Class SRE implements DevOps
“SRE…
When Ops does
more engineering
than Ops”
SRE
Have you heard of SRE?
Google does it.
Jane Doe
Systems Administrator
Jane Doe
Systems Administrator
We have
SysAdmins
Jane Doe
Systems Administrator
They should be
SREs!
Jane Doe
SRE
They should be
SREs!
ITIL Book 1
ITIL Book 2
ITIL Book 3
ITIL Book 4
ITIL Book 5
Quality!
is job
#1
Sys
Admin
CAB CALENDAR
Your new title is SRE.
Now write code and be better at ops.
PROVISIONING PROCESS
Dilbert characters © Scott Adams www.dilbert.com
Sys
Admin
CAB CALENDAR
our new title is SRE.
w write code and be better at ops.
PROVISIONING PROCESS
Dilbert characters © Scott Adams www.dilbert.com
SysAdmins
Overloaded. Constant
firefighting.
Waiting in ticket queues
for everything.
Things break. Break
again. And again.
Everyone is busy, but it
doesn’t get any better.
ansformation has largely
nored Ops. Any ideas?
Have you heard of SRE?
Google does it.
Everything takes too
long, cost too much, and
break too often!
Executive

View
SysAdmins
Overloaded. Constant
firefighting.
Waiting in ticket queues
for everything.
Things break. Break
again. And again.
Everyone is busy, but it
doesn’t get any better.
ansformation has largely
nored Ops. Any ideas?
Have you heard of SRE?
Google does it.
Everything takes too
long, cost too much, and
break too often!
Executive

View
SRE (new name)
Overloaded. Constant
firefighting.
Waiting in ticket queues
for everything.
Things break. Break
again. And again.
Everyone is busy, but it
doesn’t get any better.
Our transformation has largely
ignored Ops. Any ideas?
Have you h
Google
Everything takes too
long, cost too much, and
break too often!
Executive

View
Changing job titles or adding individual skills
doesn’t make systems administrators SREs.
Changing job titles or adding individual skills
doesn’t make systems administrators SREs.
Changing job titles or adding individual skills
doesn’t make systems administrators SREs.
Observability
Programming
Skills
Distributed
Systems Arch.
Blameless
Post-Mortems
Changing job titles or adding individual skills
doesn’t make systems administrators SREs.
Observability
Programming
Skills
Distributed
Systems Arch.
Blameless
Post-Mortems
000000000000000
Changing job titles or adding individual skills
doesn’t make systems administrators SREs.
Not SRE
Observability
Programming
Skills
Distributed
Systems Arch.
Blameless
Post-Mortems
000000000000000
Changing job titles or adding individual skills
doesn’t make systems administrators SREs.
Changing job titles or adding individual skills
doesn’t make systems administrators SREs.
SRE is a rethinking of how Operations work gets
done.
Principles are what makes SRE different
Principles are what makes SRE different
Stephen Thorne, Google

At DevOps Enterprise Summit

London 2018
“Principles of SRE”
https://youtu.be/c-w_GYvi0eA
Principles are what makes SRE different
1. SRE needs Service Level Objectives, with consequences
Stephen Thorne, Google

At DevOps Enterprise Summit

London 2018
“Principles of SRE”
https://youtu.be/c-w_GYvi0eA
SLO and Error Budgets: Tools for Shared Responsibility
0
100
Service Level Objective
Error Budget*
Service Level Indicator
(*Use this to improve the service)
SLO and Error Budgets: Tools for Shared Responsibility
0
100
Service Level Objective
Error Budget*
Service Level Indicator
(*Use this to improve the service)
SLO and Error Budgets: Tools for Shared Responsibility
0
100
Service Level Objective
Error Budget*
Service Level Indicator
(*Use this to improve the service)
DEV
BIZ
Ops
SLO and Error Budgets: Tools for Shared Responsibility
0
100
Service Level Objective
Error Budget*
Service Level Indicator
(*Use this to improve the service)
DEV
BIZ
Ops
SLO takes priority!!
Principles of SRE are what set SRE apart
1. SRE needs Service Level Objectives, with consequences
Principles of SRE are what set SRE apart
1. SRE needs Service Level Objectives, with consequences

2. SREs have time to make tomorrow better than today
Toil: Name For a Problem We’ve All Felt
Toil: Name For a Problem We’ve All Felt
“Toil is the kind of work tied to running a production
service that tends to be manual, repetitive,
automatable, tactical, devoid of enduring value, and
that scales linearly as a service grows.”
-Vivek Rau

Google
Toil vs. Engineering Work
Toil Engineering Work
Lacks Enduring Value Builds Enduring Value
Rote, Repetitive Creative, Iterative
Tactical Strategic
Increases With Scale Enables Scaling
Can Be Automated Requires Human Creativity
Excessive Toil Prevents Fixing the System
Toil Engineering Work
E.W.Toil
Reduce toil
Improve the business ǡ
No capacity to reduce toil
No capacity to improve business
Toil at manageable percentage of capacity
Toil at unmanageable percentage of capacity (“Engineering Bankruptcy”)
Excessive Toil Prevents Fixing the System
Toil Engineering Work
E.W.Toil
Reduce toil
Improve the business ǡ
No capacity to reduce toil
No capacity to improve business
Toil at manageable percentage of capacity
Toil at unmanageable percentage of capacity (“Engineering Bankruptcy”)
Excessive Toil Prevents Fixing the System
Toil Engineering Work
E.W.Toil
Reduce toil
Improve the business ǡ
No capacity to reduce toil
No capacity to improve business
Toil at manageable percentage of capacity
Toil at unmanageable percentage of capacity (“Engineering Bankruptcy”)
Downward spiral is inevitable!
Principles of SRE are what set SRE apart
1. SRE needs Service Level Objectives, with consequences

2. SREs have time to make tomorrow better than today
Principles of SRE are what set SRE apart
1. SRE needs Service Level Objectives, with consequences

2. SREs have time to make tomorrow better than today
3. SRE teams have the ability to regulate their workload
SRE teams have the ability to regulate their workload
SRE teams have the ability to regulate their workload
Example:
SRE teams have the ability to regulate their workload
Example:
What if handing-off responsibility to SRE/Ops wasn’t a right?
SRE teams have the ability to regulate their workload
Example:
What if handing-off responsibility to SRE/Ops wasn’t a right?
(separate the “running in production” from “run by SRE/Ops”)
SRE teams have the ability to regulate their workload
Example:
What if handing-off responsibility to SRE/Ops wasn’t a right?
(separate the “running in production” from “run by SRE/Ops”)
“?!?”
Principles of SRE are what set SRE apart
1. SRE needs Service Level Objectives, with consequences

2. SREs have time to make tomorrow better than today
3. SRE teams have the ability to regulate their workload
Where to start (the practical approach)
Where to start (the practical approach)
1. SRE needs Service Level Objectives, with consequences

2. SREs have time to make tomorrow better than today

3. SRE teams have the ability to regulate their workload
Where to start (the practical approach)
1. SRE needs Service Level Objectives, with consequences

2. SREs have time to make tomorrow better than today

3. SRE teams have the ability to regulate their workload
Company-wide culture change (hard!)
Where to start (the practical approach)
1. SRE needs Service Level Objectives, with consequences

2. SREs have time to make tomorrow better than today

3. SRE teams have the ability to regulate their workload
Company-wide culture change (hard!)
Company-wide culture change (hard!)
Where to start (the practical approach)
1. SRE needs Service Level Objectives, with consequences

2. SREs have time to make tomorrow better than today

3. SRE teams have the ability to regulate their workload
Company-wide culture change (hard!)
Company-wide culture change (hard!)
Reduce toil.

Everybody wins!
Where to start (the practical approach)
1. SRE needs Service Level Objectives, with consequences

2. SREs have time to make tomorrow better than today

3. SRE teams have the ability to regulate their workload
Company-wide culture change (hard!)
Company-wide culture change (hard!)
Reduce toil.

Everybody wins!
Why focus on reducing toil?
Why focus on reducing toil?
1. Lots of value independent of “SRE”
2. Your people are you most expensive assets

… stay out of their way!
Why focus on reducing toil?
1. Lots of value independent of “SRE”
Start reducing toil today
Toil
Start reducing toil today
1. Track toil levels for each team
Toil
Start reducing toil today
1. Track toil levels for each team
Toil
Track toil levels for each team
Track toil levels for each team
• Standardize (e.g. meetings and email are “overhead" not “toil”)
Track toil levels for each team
• Standardize (e.g. meetings and email are “overhead" not “toil”)
• Track

• Self-reporting

• Periodic surveys

• SM or PM interview/sampling
Track toil levels for each team
• Standardize (e.g. meetings and email are “overhead" not “toil”)
• Track

• Self-reporting

• Periodic surveys

• SM or PM interview/sampling
• Don’t get lost in time tracking weeds!
Start reducing toil today
1. Track toil levels for each team
Toil
Start reducing toil today
1. Track toil levels for each team
Toil
2. Set toil limit for each team (50% is conventional wisdom)
Start reducing toil today
1. Track toil levels for each team

2. Set toil limit for each team (50% is conventional wisdom)

3. Fund efforts to reduce toil (with emphasis on teams already over limit)
Toil
Start reducing toil today
1. Track toil levels for each team

2. Set toil limit for each team (50% is conventional wisdom)

3. Fund efforts to reduce toil (with emphasis on teams already over limit)
Toil
Michael Kehoe

Todd Palino 

(LinkedIn)

At SREcon Americas 2019

Example
Process
“Code Yellow”
Where to focus?
Toil
Where to focus?
Toil
Reduce
Technical Debt
Where to focus?
Toil
Reduce
Technical Debt
Re-Engineer

Processes
Where to focus?
Toil
Reduce
Technical Debt
Re-Engineer

Processes
Enable
Self-Service
Where to focus?
Toil
Reduce
Technical Debt
Re-Engineer

Processes
Enable
Self-Service
Eliminate Interruptions
Eliminate Waiting
Eliminate Interruptions
Eliminate Waiting
Self-Service
Do X.
Eliminate Interruptions
Eliminate Waiting
Self-Service
Do X.
… and a lot less toil
How to enable self-service?
Empower teams to spot and fix the anti-patterns.
“Do this for me, do it again, then do it again.”
Done.I need you
to do X
Your
other
work
I need you
to do X
I need you
to do X
Ticket
Do X
Later…
Do X
Do X
Done.
Done.
Your
other
work
Self-Service
Self-Service
Self-Service
Your
other
work x2
Your
other
work x3
Later…Later…
Later…
Your
other
work
Your
other
work
After
Before
Wait Interrupt
Ticket
Wait Interrupt
Ticket
Wait Interrupt
“Do this for me, do it again, then do it again.”
Done.I need you
to do X
Your
other
work
I need you
to do X
I need you
to do X
Ticket
Do X
Later…
Do X
Do X
Done.
Done.
Your
other
work
Self-Service
Self-Service
Self-Service
Your
other
work x2
Your
other
work x3
Later…Later…
Later…
Your
other
work
Your
other
work
After
Before
Wait Interrupt
Ticket
Wait Interrupt
Ticket
Wait Interrupt
“I could fix it, but I can’t get to it.”
Environment
I could fix it if I
could get to it
Before
Wait
Interrupt
“I could fix it, but I can’t get to it.”
Environment
I could fix it if I
could get to it
Before
Wait
Interrupt
After
I’ve got this!
Environment
Self-
Service
“The dog-pile.”
!!
I think its a problem with
db07-store2.uswest.acme
“$ top”
“$ top”
db07store2.
uswest.acme
“$ top”
“$ top”
“$ top”
!!
“$ top”
!!
!!
!!
healthcheck
store2 -all
db07store2.
uswest.acme
Self-Service
1.
2.
3.
I think its a problem with
db07-store2.uswest.acme
“I’m an expert, I don’t read the wiki.”
docs
Service has changed. Use this flag or
bad things will happen!
Pause monitoring first or
we all get woken up!
“restart -doit -now”
I’ve done this before.
I’ve got this…
Environment
docs
Later…
Before
“I’m an expert, I don’t read the wiki.”
docs
Service has changed. Use this flag or
bad things will happen!
Pause monitoring first or
we all get woken up!
“restart -doit -now”
I’ve done this before.
I’ve got this…
Environment
docs
Later…
Before
“I’m an expert, I don’t read the wiki.”
docs
Service has changed. Use this flag or
bad things will happen!
Pause monitoring first or
we all get woken up!
“restart -doit -now”
I’ve done this before.
I’ve got this…
Environment
docs
Later…
Before
Service has changed. Use this flag or
bad things will happen!
Pause monitoring first or
we all get woken up!
“restart”
Environment
Later…
Update
Restart Job
✅
I’ve done this before.
I’ve got this.
Self-Service
Self-Service
After
“Known issue… doesn’t get permanent fix”
“Known issue… doesn’t get permanent fix”
Self-Service Operations Design Pattern (in a nutshell)
Consumer of
Ops Capabilities
Self-Service
On
Demand
Ops Capability
Specialist
Knowledge
Ops Capability
Specialist
Knowledge
Self-Service Operations Design Pattern (in a nutshell)
Pull-Based
Consumer of
Ops Capabilities
Self-Service
On
Demand
Ops Capability
Specialist
Knowledge
Ops Capability
Specialist
Knowledge
Self-Service Operations Design Pattern (in a nutshell)
Pull-Based
Accept tools/languages
that teams want to use
Consumer of
Ops Capabilities
Self-Service
On
Demand
Ops Capability
Specialist
Knowledge
Ops Capability
Specialist
Knowledge
Self-Service Operations Design Pattern (in a nutshell)
Pull-Based
Accept tools/languages
that teams want to use
Define “guardrails” to
provide work safety
Consumer of
Ops Capabilities
Self-Service
On
Demand
Ops Capability
Specialist
Knowledge
Ops Capability
Specialist
Knowledge
Self-Service Operations Design Pattern (in a nutshell)
Pull-Based
Accept tools/languages
that teams want to use
Let people who
“push buttons”
define the buttons
Define “guardrails” to
provide work safety
Consumer of
Ops Capabilities
Self-Service
On
Demand
Ops Capability
Specialist
Knowledge
Ops Capability
Specialist
Knowledge
Self-Service Operations Design Pattern (in a nutshell)
Pull-Based
Accept tools/languages
that teams want to use
Let people who
“push buttons”
define the buttons
Build in security
and compliance
Define “guardrails” to
provide work safety
Consumer of
Ops Capabilities
Self-Service
On
Demand
Ops Capability
Specialist
Knowledge
Ops Capability
Specialist
Knowledge
Self-Service is ultimately about user experience
Consumer of
Ops Capabilities
Self-Service
On
Demand
Ops Capability
Specialist
Knowledge
Ops Capability
Specialist
Knowledge
Self-Service is ultimately about user experience
Consumer of
Ops Capabilities
Self-Service
On
Demand
Ops Capability
Specialist
Knowledge
Ops Capability
Specialist
Knowledge
1.Work how they want to work (GUI, API, CLI)
Self-Service is ultimately about user experience
Consumer of
Ops Capabilities
Self-Service
On
Demand
Ops Capability
Specialist
Knowledge
Ops Capability
Specialist
Knowledge
1.Work how they want to work (GUI, API, CLI)
2. “Guardrails” (Smart options that helpfully constrain)
Self-Service is ultimately about user experience
Consumer of
Ops Capabilities
Self-Service
On
Demand
Ops Capability
Specialist
Knowledge
Ops Capability
Specialist
Knowledge
1.Work how they want to work (GUI, API, CLI)
2. “Guardrails” (Smart options that helpfully constrain)
3.Dynamic resource model

(Up-to-date details of your environment)
Self-Service can also be a foundation
for strategic initiatives
Strategic: Improve incident response times
https://youtu.be/USYrDaPEFtM
Jody Mulkey at DOES ‘15 SF
Strategic: Improve incident response times
https://youtu.be/USYrDaPEFtM
Jody Mulkey at DOES ‘15 SF
Services Monitoring Scripts/Tools Services Monitoring Scripts/ToolsServices Monitoring Scripts/Tools
DEV STAGE PROD
Dev & QA NOC/Ops Dev
Promote
approved
jobs
Self-Service Self-Service
Empower
Strategic: Improve incident response times
https://youtu.be/USYrDaPEFtM
Jody Mulkey at DOES ‘15 SF
Services Monitoring Scripts/Tools Services Monitoring Scripts/ToolsServices Monitoring Scripts/Tools
DEV STAGE PROD
Dev & QA NOC/Ops Dev
Promote
approved
jobs
Self-Service Self-Service
Empower
Strategic: Improve incident response times
https://youtu.be/USYrDaPEFtM
Jody Mulkey at DOES ‘15 SF
Services Monitoring Scripts/Tools Services Monitoring Scripts/ToolsServices Monitoring Scripts/Tools
DEV STAGE PROD
Dev & QA NOC/Ops Dev
Promote
approved
jobs
Self-Service Self-Service
Empower
• Reduced MTTR by 92%

• Reduced escalations by 50%

• Reduced overall support costs by 55%
Strategic: Reduce compliance burden & improve
Shaun Norris at DOES ‘18 Las Vegas
https://youtu.be/d5IMvK0YHTg
Strategic: Reduce compliance burden & improve
Shaun Norris at DOES ‘18 Las Vegas
https://youtu.be/d5IMvK0YHTg
Optimized for compliance
• 86,000+ employees

• 60+ countries

• Highly regulated
Strategic: Reduce compliance burden & improve
Shaun Norris at DOES ‘18 Las Vegas
https://youtu.be/d5IMvK0YHTg
Optimized for compliance
• 86,000+ employees

• 60+ countries

• Highly regulated
LOB #1
LOB #2 LOB #3
LOB …n
Services Scripts/Tools
Data Center
Services Scripts/Tools
Data Center
Services Scripts/Tools
Data Center Services Scripts/Tools
Cloud
Services Scripts/Tools
Cloud
Services Scripts/Tools
Cloud
Services Scripts/Tools
Cloud
Self-Service
ComplianceConsistency
Strategic: Reduce compliance burden & improve
Shaun Norris at DOES ‘18 Las Vegas
https://youtu.be/d5IMvK0YHTg
Optimized for compliance
• 86,000+ employees

• 60+ countries

• Highly regulated
LOB #1
LOB #2 LOB #3
LOB …n
Services Scripts/Tools
Data Center
Services Scripts/Tools
Data Center
Services Scripts/Tools
Data Center Services Scripts/Tools
Cloud
Services Scripts/Tools
Cloud
Services Scripts/Tools
Cloud
Services Scripts/Tools
Cloud
Self-Service
ComplianceConsistency
12 months: 

• Saved 28 person years of time

• 13,000+ ops tasks in privileged environments that
didn’t require a review

• ~200 less customer impacting events
Recap: Make Tomorrow Better Than Today
SRE is more than a title
Be practical and start focusing
on toil
Find and fix toil anti-patterns
Error Budgets and Toil Limits
Apply Self-Service Operations
design pattern
Toil Engineering Work
E.W.Toil
Reduce toil
Improve the business ǡ
No capacity to reduce toil
No capacity to improve business
Toil at manageable percentage of capacity
Toil at unmanageable percentage of capacity (“Engineering Bankruptcy”)
SRE is a new way to think
about Ops work
ITIL Book 1
ITIL Book 2
ITIL Book 3
ITIL Book 4
ITIL Book 5
Quality!
is job
#1
Sys
Admin
CAB CALENDAR
Your new title is SRE.
Now write code and be better at ops.
PROVISIONING PROCESS
Dilbert characters © Scott Adams www.dilbert.com
1. SRE needs Service Level
Objectives, with consequences

2. SREs have time to make
tomorrow better than today

3. SRE teams have the ability to
regulate their workload
0
100
Service Level Objective
Error Budget*
Service Level Indicator
(*Use this to improve the service)
Done.I need you
to do X
Your
other
work
I need you
to do X
I need you
to do X
Ticket
Do X
Later…
Do X
Do X
Done.
Done.
Your
other
work
Self-Service
Self-Service
Self-Service
Your
other
work x2
Your
other
work x3
Later…Later…
Later…
Your
other
work
Your
other
work
After
Before
Wait Interrupt
Ticket
Wait Interrupt
Ticket
Wait Interrupt
Consumer of
Ops Capabilities
Self-Service Operation
On
Demand
Ops Capability
Specialist
Knowledge
Ops Capability
Specialist
Knowledge
Toil
Let’s talk…
@damonedwards
damon@rundeck.com

More Related Content

What's hot

Implementing SRE practices: SLI/SLO deep dive - David Blank-Edelman - DevOpsD...
Implementing SRE practices: SLI/SLO deep dive - David Blank-Edelman - DevOpsD...Implementing SRE practices: SLI/SLO deep dive - David Blank-Edelman - DevOpsD...
Implementing SRE practices: SLI/SLO deep dive - David Blank-Edelman - DevOpsD...DevOpsDays Tel Aviv
 
Site reliability engineering
Site reliability engineeringSite reliability engineering
Site reliability engineeringJason Loeffler
 
Building an SRE Organization @ Squarespace
Building an SRE Organization @ SquarespaceBuilding an SRE Organization @ Squarespace
Building an SRE Organization @ SquarespaceFranklin Angulo
 
SRE-iously! Defining the Principles, Habits, and Practices of Site Reliabilit...
SRE-iously! Defining the Principles, Habits, and Practices of Site Reliabilit...SRE-iously! Defining the Principles, Habits, and Practices of Site Reliabilit...
SRE-iously! Defining the Principles, Habits, and Practices of Site Reliabilit...Tori Wieldt
 
How Small Team Get Ready for SRE (public version)
How Small Team Get Ready for SRE (public version)How Small Team Get Ready for SRE (public version)
How Small Team Get Ready for SRE (public version)Setyo Legowo
 
SRE Demystified - 05 - Toil Elimination
SRE Demystified - 05 - Toil EliminationSRE Demystified - 05 - Toil Elimination
SRE Demystified - 05 - Toil EliminationDr Ganesh Iyer
 
SRE-iously: Defining the Principles, Habits, and Practices of Site Reliabilit...
SRE-iously: Defining the Principles, Habits, and Practices of Site Reliabilit...SRE-iously: Defining the Principles, Habits, and Practices of Site Reliabilit...
SRE-iously: Defining the Principles, Habits, and Practices of Site Reliabilit...New Relic
 
Site (Service) Reliability Engineering
Site (Service) Reliability EngineeringSite (Service) Reliability Engineering
Site (Service) Reliability EngineeringMark Underwood
 
SRE (service reliability engineer) on big DevOps platform running on the clou...
SRE (service reliability engineer) on big DevOps platform running on the clou...SRE (service reliability engineer) on big DevOps platform running on the clou...
SRE (service reliability engineer) on big DevOps platform running on the clou...DevClub_lv
 
SRE 101 (Site Reliability Engineering)
SRE 101 (Site Reliability Engineering)SRE 101 (Site Reliability Engineering)
SRE 101 (Site Reliability Engineering)Hussain Mansoor
 
Incident Management in the Age of DevOps and SRE
Incident Management in the Age of DevOps and SRE Incident Management in the Age of DevOps and SRE
Incident Management in the Age of DevOps and SRE Rundeck
 
Rapid Strategic SRE Assessments
Rapid Strategic SRE AssessmentsRapid Strategic SRE Assessments
Rapid Strategic SRE AssessmentsMarc Hornbeek
 
SRE Demystified - 14 - SRE Practices overview
SRE Demystified - 14 - SRE Practices overviewSRE Demystified - 14 - SRE Practices overview
SRE Demystified - 14 - SRE Practices overviewDr Ganesh Iyer
 
Getting started with Site Reliability Engineering (SRE)
Getting started with Site Reliability Engineering (SRE)Getting started with Site Reliability Engineering (SRE)
Getting started with Site Reliability Engineering (SRE)Abeer R
 
Site Reliability Engineer (SRE), We Keep The Lights On 24/7
Site Reliability Engineer (SRE), We Keep The Lights On 24/7Site Reliability Engineer (SRE), We Keep The Lights On 24/7
Site Reliability Engineer (SRE), We Keep The Lights On 24/7NUS-ISS
 
Cloud Native Engineering with SRE and GitOps
Cloud Native Engineering with SRE and GitOpsCloud Native Engineering with SRE and GitOps
Cloud Native Engineering with SRE and GitOpsWeaveworks
 
Chaos engineering & Gameday on AWS
Chaos engineering & Gameday on AWSChaos engineering & Gameday on AWS
Chaos engineering & Gameday on AWSBilal Aybar
 

What's hot (20)

Implementing SRE practices: SLI/SLO deep dive - David Blank-Edelman - DevOpsD...
Implementing SRE practices: SLI/SLO deep dive - David Blank-Edelman - DevOpsD...Implementing SRE practices: SLI/SLO deep dive - David Blank-Edelman - DevOpsD...
Implementing SRE practices: SLI/SLO deep dive - David Blank-Edelman - DevOpsD...
 
Site reliability engineering
Site reliability engineeringSite reliability engineering
Site reliability engineering
 
Building an SRE Organization @ Squarespace
Building an SRE Organization @ SquarespaceBuilding an SRE Organization @ Squarespace
Building an SRE Organization @ Squarespace
 
SRE-iously! Defining the Principles, Habits, and Practices of Site Reliabilit...
SRE-iously! Defining the Principles, Habits, and Practices of Site Reliabilit...SRE-iously! Defining the Principles, Habits, and Practices of Site Reliabilit...
SRE-iously! Defining the Principles, Habits, and Practices of Site Reliabilit...
 
How Small Team Get Ready for SRE (public version)
How Small Team Get Ready for SRE (public version)How Small Team Get Ready for SRE (public version)
How Small Team Get Ready for SRE (public version)
 
SRE Demystified - 05 - Toil Elimination
SRE Demystified - 05 - Toil EliminationSRE Demystified - 05 - Toil Elimination
SRE Demystified - 05 - Toil Elimination
 
SRE-iously: Defining the Principles, Habits, and Practices of Site Reliabilit...
SRE-iously: Defining the Principles, Habits, and Practices of Site Reliabilit...SRE-iously: Defining the Principles, Habits, and Practices of Site Reliabilit...
SRE-iously: Defining the Principles, Habits, and Practices of Site Reliabilit...
 
Site (Service) Reliability Engineering
Site (Service) Reliability EngineeringSite (Service) Reliability Engineering
Site (Service) Reliability Engineering
 
SRE (service reliability engineer) on big DevOps platform running on the clou...
SRE (service reliability engineer) on big DevOps platform running on the clou...SRE (service reliability engineer) on big DevOps platform running on the clou...
SRE (service reliability engineer) on big DevOps platform running on the clou...
 
SRE 101 (Site Reliability Engineering)
SRE 101 (Site Reliability Engineering)SRE 101 (Site Reliability Engineering)
SRE 101 (Site Reliability Engineering)
 
Incident Management in the Age of DevOps and SRE
Incident Management in the Age of DevOps and SRE Incident Management in the Age of DevOps and SRE
Incident Management in the Age of DevOps and SRE
 
SRE in Startup
SRE in StartupSRE in Startup
SRE in Startup
 
Rapid Strategic SRE Assessments
Rapid Strategic SRE AssessmentsRapid Strategic SRE Assessments
Rapid Strategic SRE Assessments
 
SRE vs DevOps
SRE vs DevOpsSRE vs DevOps
SRE vs DevOps
 
SRE Demystified - 14 - SRE Practices overview
SRE Demystified - 14 - SRE Practices overviewSRE Demystified - 14 - SRE Practices overview
SRE Demystified - 14 - SRE Practices overview
 
Getting started with Site Reliability Engineering (SRE)
Getting started with Site Reliability Engineering (SRE)Getting started with Site Reliability Engineering (SRE)
Getting started with Site Reliability Engineering (SRE)
 
Site Reliability Engineer (SRE), We Keep The Lights On 24/7
Site Reliability Engineer (SRE), We Keep The Lights On 24/7Site Reliability Engineer (SRE), We Keep The Lights On 24/7
Site Reliability Engineer (SRE), We Keep The Lights On 24/7
 
Sre summary
Sre summarySre summary
Sre summary
 
Cloud Native Engineering with SRE and GitOps
Cloud Native Engineering with SRE and GitOpsCloud Native Engineering with SRE and GitOps
Cloud Native Engineering with SRE and GitOps
 
Chaos engineering & Gameday on AWS
Chaos engineering & Gameday on AWSChaos engineering & Gameday on AWS
Chaos engineering & Gameday on AWS
 

Similar to SRE for Everyone: Making Tomorrow Better Than Today

SysAdmin to SRE: Creating Capacity to Make Tomorrow Better Than Today
SysAdmin to SRE: Creating Capacity to Make Tomorrow Better Than Today  SysAdmin to SRE: Creating Capacity to Make Tomorrow Better Than Today
SysAdmin to SRE: Creating Capacity to Make Tomorrow Better Than Today Rundeck
 
SRE Lessons for the Enterprise
SRE Lessons for the Enterprise SRE Lessons for the Enterprise
SRE Lessons for the Enterprise Rundeck
 
Clearing the Way For SRE In the Enterprise
Clearing the Way For SRE In the Enterprise Clearing the Way For SRE In the Enterprise
Clearing the Way For SRE In the Enterprise Rundeck
 
2019-11 NewOpsDays Dallas - Sysadmin to SRE _v1.1
2019-11 NewOpsDays Dallas  - Sysadmin to SRE _v1.12019-11 NewOpsDays Dallas  - Sysadmin to SRE _v1.1
2019-11 NewOpsDays Dallas - Sysadmin to SRE _v1.1Jorn Knuttila
 
NewOps Days Boston 2019 - SysAdmin to SRE: Creating Capacity to Make Tomorrow...
NewOps Days Boston 2019 - SysAdmin to SRE: Creating Capacity to Make Tomorrow...NewOps Days Boston 2019 - SysAdmin to SRE: Creating Capacity to Make Tomorrow...
NewOps Days Boston 2019 - SysAdmin to SRE: Creating Capacity to Make Tomorrow...Jorn Knuttila
 
Getting Senior Management Support for It projects.pptx
Getting Senior Management Support for It projects.pptxGetting Senior Management Support for It projects.pptx
Getting Senior Management Support for It projects.pptxAmauryMarque
 
Clean Code - 5
Clean Code - 5Clean Code - 5
Clean Code - 5Don Kim
 
Let's bring the teams back together
Let's bring the teams back togetherLet's bring the teams back together
Let's bring the teams back togetherKris Buytaert
 
Team Capability Assessment PowerPoint Presentation Slides
Team Capability Assessment PowerPoint Presentation Slides Team Capability Assessment PowerPoint Presentation Slides
Team Capability Assessment PowerPoint Presentation Slides SlideTeam
 
Be Agile. Scale Up. Stay Lean. And Have More Fun by Dean Leffingwell
Be Agile. Scale Up. Stay Lean. And Have More Fun by Dean LeffingwellBe Agile. Scale Up. Stay Lean. And Have More Fun by Dean Leffingwell
Be Agile. Scale Up. Stay Lean. And Have More Fun by Dean LeffingwellAgile Software Community of India
 
Incident Management in the Age of DevOps and SRE
Incident Management in the Age of DevOps and SRE Incident Management in the Age of DevOps and SRE
Incident Management in the Age of DevOps and SRE Rundeck
 
EVO Energy Consulting Brand Development
EVO Energy Consulting Brand DevelopmentEVO Energy Consulting Brand Development
EVO Energy Consulting Brand DevelopmentRandy Stuart
 
Agile v agility_v4_md
Agile v agility_v4_mdAgile v agility_v4_md
Agile v agility_v4_mdMarc Danziger
 
Competitor analysis
Competitor analysisCompetitor analysis
Competitor analysisNileshShaw
 
The 7 Deadly Sins Of Almost Being Agile
The 7 Deadly Sins Of Almost Being AgileThe 7 Deadly Sins Of Almost Being Agile
The 7 Deadly Sins Of Almost Being Agilelazygolfer
 
NUS-ISS Learning Day 2019-Site Reliability Engineering – The Modern Method fo...
NUS-ISS Learning Day 2019-Site Reliability Engineering – The Modern Method fo...NUS-ISS Learning Day 2019-Site Reliability Engineering – The Modern Method fo...
NUS-ISS Learning Day 2019-Site Reliability Engineering – The Modern Method fo...NUS-ISS
 
VMUG Melbourne - DevOps - Not Just for Open Source and Unicorns
VMUG Melbourne - DevOps - Not Just for Open Source and UnicornsVMUG Melbourne - DevOps - Not Just for Open Source and Unicorns
VMUG Melbourne - DevOps - Not Just for Open Source and UnicornsJosh Atwell
 
The Last Mile Continued: Incident Management
The Last Mile Continued: Incident Management The Last Mile Continued: Incident Management
The Last Mile Continued: Incident Management Rundeck
 
The Agile Manifesto in the Star Wars Universe
The Agile Manifesto in the Star Wars UniverseThe Agile Manifesto in the Star Wars Universe
The Agile Manifesto in the Star Wars UniverseAaron Griffith
 

Similar to SRE for Everyone: Making Tomorrow Better Than Today (20)

SysAdmin to SRE: Creating Capacity to Make Tomorrow Better Than Today
SysAdmin to SRE: Creating Capacity to Make Tomorrow Better Than Today  SysAdmin to SRE: Creating Capacity to Make Tomorrow Better Than Today
SysAdmin to SRE: Creating Capacity to Make Tomorrow Better Than Today
 
SRE Lessons for the Enterprise
SRE Lessons for the Enterprise SRE Lessons for the Enterprise
SRE Lessons for the Enterprise
 
Clearing the Way For SRE In the Enterprise
Clearing the Way For SRE In the Enterprise Clearing the Way For SRE In the Enterprise
Clearing the Way For SRE In the Enterprise
 
2019-11 NewOpsDays Dallas - Sysadmin to SRE _v1.1
2019-11 NewOpsDays Dallas  - Sysadmin to SRE _v1.12019-11 NewOpsDays Dallas  - Sysadmin to SRE _v1.1
2019-11 NewOpsDays Dallas - Sysadmin to SRE _v1.1
 
NewOps Days Boston 2019 - SysAdmin to SRE: Creating Capacity to Make Tomorrow...
NewOps Days Boston 2019 - SysAdmin to SRE: Creating Capacity to Make Tomorrow...NewOps Days Boston 2019 - SysAdmin to SRE: Creating Capacity to Make Tomorrow...
NewOps Days Boston 2019 - SysAdmin to SRE: Creating Capacity to Make Tomorrow...
 
Getting Senior Management Support for It projects.pptx
Getting Senior Management Support for It projects.pptxGetting Senior Management Support for It projects.pptx
Getting Senior Management Support for It projects.pptx
 
Clean Code - 5
Clean Code - 5Clean Code - 5
Clean Code - 5
 
Let's bring the teams back together
Let's bring the teams back togetherLet's bring the teams back together
Let's bring the teams back together
 
Team Capability Assessment PowerPoint Presentation Slides
Team Capability Assessment PowerPoint Presentation Slides Team Capability Assessment PowerPoint Presentation Slides
Team Capability Assessment PowerPoint Presentation Slides
 
Be Agile. Scale Up. Stay Lean. And Have More Fun by Dean Leffingwell
Be Agile. Scale Up. Stay Lean. And Have More Fun by Dean LeffingwellBe Agile. Scale Up. Stay Lean. And Have More Fun by Dean Leffingwell
Be Agile. Scale Up. Stay Lean. And Have More Fun by Dean Leffingwell
 
Introduction to Agile
Introduction to AgileIntroduction to Agile
Introduction to Agile
 
Incident Management in the Age of DevOps and SRE
Incident Management in the Age of DevOps and SRE Incident Management in the Age of DevOps and SRE
Incident Management in the Age of DevOps and SRE
 
EVO Energy Consulting Brand Development
EVO Energy Consulting Brand DevelopmentEVO Energy Consulting Brand Development
EVO Energy Consulting Brand Development
 
Agile v agility_v4_md
Agile v agility_v4_mdAgile v agility_v4_md
Agile v agility_v4_md
 
Competitor analysis
Competitor analysisCompetitor analysis
Competitor analysis
 
The 7 Deadly Sins Of Almost Being Agile
The 7 Deadly Sins Of Almost Being AgileThe 7 Deadly Sins Of Almost Being Agile
The 7 Deadly Sins Of Almost Being Agile
 
NUS-ISS Learning Day 2019-Site Reliability Engineering – The Modern Method fo...
NUS-ISS Learning Day 2019-Site Reliability Engineering – The Modern Method fo...NUS-ISS Learning Day 2019-Site Reliability Engineering – The Modern Method fo...
NUS-ISS Learning Day 2019-Site Reliability Engineering – The Modern Method fo...
 
VMUG Melbourne - DevOps - Not Just for Open Source and Unicorns
VMUG Melbourne - DevOps - Not Just for Open Source and UnicornsVMUG Melbourne - DevOps - Not Just for Open Source and Unicorns
VMUG Melbourne - DevOps - Not Just for Open Source and Unicorns
 
The Last Mile Continued: Incident Management
The Last Mile Continued: Incident Management The Last Mile Continued: Incident Management
The Last Mile Continued: Incident Management
 
The Agile Manifesto in the Star Wars Universe
The Agile Manifesto in the Star Wars UniverseThe Agile Manifesto in the Star Wars Universe
The Agile Manifesto in the Star Wars Universe
 

More from Rundeck

Rundeck Community Office Hours: Using Variables with Job Steps
Rundeck Community Office Hours:  Using Variables with Job Steps Rundeck Community Office Hours:  Using Variables with Job Steps
Rundeck Community Office Hours: Using Variables with Job Steps Rundeck
 
Introducing PagerDuty Process Automation
Introducing PagerDuty Process AutomationIntroducing PagerDuty Process Automation
Introducing PagerDuty Process AutomationRundeck
 
How to Build a Custom Plugin in Rundeck
How to Build a Custom Plugin in RundeckHow to Build a Custom Plugin in Rundeck
How to Build a Custom Plugin in RundeckRundeck
 
Lunch and learn: Getting started with Rundeck & Ansible
Lunch and learn:  Getting started with Rundeck & AnsibleLunch and learn:  Getting started with Rundeck & Ansible
Lunch and learn: Getting started with Rundeck & AnsibleRundeck
 
Self Service Cloud Operations: Safely Delegate the Management of your Cloud ...
Self Service Cloud Operations:  Safely Delegate the Management of your Cloud ...Self Service Cloud Operations:  Safely Delegate the Management of your Cloud ...
Self Service Cloud Operations: Safely Delegate the Management of your Cloud ...Rundeck
 
Rundeck Office Hours: Best Practices Access Control Policies
Rundeck Office Hours:  Best Practices Access Control PoliciesRundeck Office Hours:  Best Practices Access Control Policies
Rundeck Office Hours: Best Practices Access Control PoliciesRundeck
 
Mastering Secrets Management in Rundeck
Mastering Secrets Management in RundeckMastering Secrets Management in Rundeck
Mastering Secrets Management in RundeckRundeck
 
What's New in Rundeck 3.4
What's New in Rundeck 3.4   What's New in Rundeck 3.4
What's New in Rundeck 3.4 Rundeck
 
Automate Yourself Out of a Job: Safely Delegate the Management of your Azure...
Automate Yourself Out of a Job:  Safely Delegate the Management of your Azure...Automate Yourself Out of a Job:  Safely Delegate the Management of your Azure...
Automate Yourself Out of a Job: Safely Delegate the Management of your Azure...Rundeck
 
Super-Charge Your Site Reliability Practices with Runbook Automation
Super-Charge Your Site Reliability Practices with Runbook Automation Super-Charge Your Site Reliability Practices with Runbook Automation
Super-Charge Your Site Reliability Practices with Runbook Automation Rundeck
 
Introduction to Rundeck
Introduction to Rundeck Introduction to Rundeck
Introduction to Rundeck Rundeck
 
Automated Remediation with Rundeck + Sensu
Automated Remediation with Rundeck + SensuAutomated Remediation with Rundeck + Sensu
Automated Remediation with Rundeck + SensuRundeck
 
Modernizing Incident Response
Modernizing Incident Response Modernizing Incident Response
Modernizing Incident Response Rundeck
 
Runbook Automation: Old News or a Key to Unlock Performance? [DOES2020]
Runbook Automation: Old News or a Key to Unlock Performance? [DOES2020]Runbook Automation: Old News or a Key to Unlock Performance? [DOES2020]
Runbook Automation: Old News or a Key to Unlock Performance? [DOES2020]Rundeck
 
Datadog + Rundeck at DASH 2020
Datadog + Rundeck at DASH 2020Datadog + Rundeck at DASH 2020
Datadog + Rundeck at DASH 2020Rundeck
 
Rundeck Overview
Rundeck OverviewRundeck Overview
Rundeck OverviewRundeck
 
Empower Devs, Simplify Ops, and Accelerate your Digital Transformation
Empower Devs, Simplify Ops, and Accelerate your Digital TransformationEmpower Devs, Simplify Ops, and Accelerate your Digital Transformation
Empower Devs, Simplify Ops, and Accelerate your Digital TransformationRundeck
 
Advanced Cluster Settings
Advanced Cluster Settings Advanced Cluster Settings
Advanced Cluster Settings Rundeck
 
Maximizing Your Rundeck Migration
Maximizing Your Rundeck Migration Maximizing Your Rundeck Migration
Maximizing Your Rundeck Migration Rundeck
 
Business Continuity for Humans: Keeping Your Business Running When Your Peopl...
Business Continuity for Humans: Keeping Your Business Running When Your Peopl...Business Continuity for Humans: Keeping Your Business Running When Your Peopl...
Business Continuity for Humans: Keeping Your Business Running When Your Peopl...Rundeck
 

More from Rundeck (20)

Rundeck Community Office Hours: Using Variables with Job Steps
Rundeck Community Office Hours:  Using Variables with Job Steps Rundeck Community Office Hours:  Using Variables with Job Steps
Rundeck Community Office Hours: Using Variables with Job Steps
 
Introducing PagerDuty Process Automation
Introducing PagerDuty Process AutomationIntroducing PagerDuty Process Automation
Introducing PagerDuty Process Automation
 
How to Build a Custom Plugin in Rundeck
How to Build a Custom Plugin in RundeckHow to Build a Custom Plugin in Rundeck
How to Build a Custom Plugin in Rundeck
 
Lunch and learn: Getting started with Rundeck & Ansible
Lunch and learn:  Getting started with Rundeck & AnsibleLunch and learn:  Getting started with Rundeck & Ansible
Lunch and learn: Getting started with Rundeck & Ansible
 
Self Service Cloud Operations: Safely Delegate the Management of your Cloud ...
Self Service Cloud Operations:  Safely Delegate the Management of your Cloud ...Self Service Cloud Operations:  Safely Delegate the Management of your Cloud ...
Self Service Cloud Operations: Safely Delegate the Management of your Cloud ...
 
Rundeck Office Hours: Best Practices Access Control Policies
Rundeck Office Hours:  Best Practices Access Control PoliciesRundeck Office Hours:  Best Practices Access Control Policies
Rundeck Office Hours: Best Practices Access Control Policies
 
Mastering Secrets Management in Rundeck
Mastering Secrets Management in RundeckMastering Secrets Management in Rundeck
Mastering Secrets Management in Rundeck
 
What's New in Rundeck 3.4
What's New in Rundeck 3.4   What's New in Rundeck 3.4
What's New in Rundeck 3.4
 
Automate Yourself Out of a Job: Safely Delegate the Management of your Azure...
Automate Yourself Out of a Job:  Safely Delegate the Management of your Azure...Automate Yourself Out of a Job:  Safely Delegate the Management of your Azure...
Automate Yourself Out of a Job: Safely Delegate the Management of your Azure...
 
Super-Charge Your Site Reliability Practices with Runbook Automation
Super-Charge Your Site Reliability Practices with Runbook Automation Super-Charge Your Site Reliability Practices with Runbook Automation
Super-Charge Your Site Reliability Practices with Runbook Automation
 
Introduction to Rundeck
Introduction to Rundeck Introduction to Rundeck
Introduction to Rundeck
 
Automated Remediation with Rundeck + Sensu
Automated Remediation with Rundeck + SensuAutomated Remediation with Rundeck + Sensu
Automated Remediation with Rundeck + Sensu
 
Modernizing Incident Response
Modernizing Incident Response Modernizing Incident Response
Modernizing Incident Response
 
Runbook Automation: Old News or a Key to Unlock Performance? [DOES2020]
Runbook Automation: Old News or a Key to Unlock Performance? [DOES2020]Runbook Automation: Old News or a Key to Unlock Performance? [DOES2020]
Runbook Automation: Old News or a Key to Unlock Performance? [DOES2020]
 
Datadog + Rundeck at DASH 2020
Datadog + Rundeck at DASH 2020Datadog + Rundeck at DASH 2020
Datadog + Rundeck at DASH 2020
 
Rundeck Overview
Rundeck OverviewRundeck Overview
Rundeck Overview
 
Empower Devs, Simplify Ops, and Accelerate your Digital Transformation
Empower Devs, Simplify Ops, and Accelerate your Digital TransformationEmpower Devs, Simplify Ops, and Accelerate your Digital Transformation
Empower Devs, Simplify Ops, and Accelerate your Digital Transformation
 
Advanced Cluster Settings
Advanced Cluster Settings Advanced Cluster Settings
Advanced Cluster Settings
 
Maximizing Your Rundeck Migration
Maximizing Your Rundeck Migration Maximizing Your Rundeck Migration
Maximizing Your Rundeck Migration
 
Business Continuity for Humans: Keeping Your Business Running When Your Peopl...
Business Continuity for Humans: Keeping Your Business Running When Your Peopl...Business Continuity for Humans: Keeping Your Business Running When Your Peopl...
Business Continuity for Humans: Keeping Your Business Running When Your Peopl...
 

Recently uploaded

Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 

Recently uploaded (20)

Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 

SRE for Everyone: Making Tomorrow Better Than Today

  • 1. SRE for Everyone: Making Tomorrow Better Than Today Damon Edwards @damonedwards 2019
  • 2.
  • 3. Not that far away, maybe in a company just like yours…
  • 4. Not that far away, maybe in a company just like yours… Overloaded. Constant firefighting. Ticket Ticket Project A ··· Project B ··· Ticket Ticket Ticket Ticket Ticket Ticket Ticket Ticket Ticket Ticket Ticket DUE: Yesterday! DUE: Tomorrow! Ticket Ticket Ticket
  • 5. Waiting in ticket queues for everything. Not that far away, maybe in a company just like yours…
  • 6. Waiting in ticket queues for everything. Ticket Not that far away, maybe in a company just like yours…
  • 7. Waiting in ticket queues for everything. Ticket Ticket Ticket Ticket Ticket Ticket Not that far away, maybe in a company just like yours…
  • 8. Things break. Break again. And again. Later… Later… same same Help! Ticket Wait Interrupt Help! Ticket Wait Interrupt Help! Ticket Wait Interrupt Not that far away, maybe in a company just like yours…
  • 9. Everyone is busy, but it doesn’t get any better. Improvement Project Business Delivery Incidents Business Delivery Business Delivery Not that far away, maybe in a company just like yours…
  • 10. Overloaded. Constant firefighting. Waiting in ticket queues for everything. Things break. Break again. And again. Everyone is busy, but it doesn’t get any better. Not that far away, maybe in a company just like yours…
  • 11. Overloaded. Constant firefighting. Waiting in ticket queues for everything. Things break. Break again. And again. Everyone is busy, but it doesn’t get any better. Not that far away, maybe in a company just like yours… Everything takes too long, costs too much, and breaks too often! Executives Have you heard of SRE? Google does it.
  • 12.
  • 13. “SRE… When you ask software engineers to do operations” “SRE… Next-generation, cloud-native Operations” Class SRE implements DevOps “SRE… When Ops does more engineering than Ops”
  • 14. “SRE… When you ask software engineers to do operations” “SRE… Next-generation, cloud-native Operations” Class SRE implements DevOps “SRE… When Ops does more engineering than Ops” SRE
  • 15. Have you heard of SRE? Google does it.
  • 20. ITIL Book 1 ITIL Book 2 ITIL Book 3 ITIL Book 4 ITIL Book 5 Quality! is job #1 Sys Admin CAB CALENDAR Your new title is SRE. Now write code and be better at ops. PROVISIONING PROCESS Dilbert characters © Scott Adams www.dilbert.com Sys Admin CAB CALENDAR our new title is SRE. w write code and be better at ops. PROVISIONING PROCESS Dilbert characters © Scott Adams www.dilbert.com
  • 21. SysAdmins Overloaded. Constant firefighting. Waiting in ticket queues for everything. Things break. Break again. And again. Everyone is busy, but it doesn’t get any better. ansformation has largely nored Ops. Any ideas? Have you heard of SRE? Google does it. Everything takes too long, cost too much, and break too often! Executive View
  • 22. SysAdmins Overloaded. Constant firefighting. Waiting in ticket queues for everything. Things break. Break again. And again. Everyone is busy, but it doesn’t get any better. ansformation has largely nored Ops. Any ideas? Have you heard of SRE? Google does it. Everything takes too long, cost too much, and break too often! Executive View SRE (new name) Overloaded. Constant firefighting. Waiting in ticket queues for everything. Things break. Break again. And again. Everyone is busy, but it doesn’t get any better. Our transformation has largely ignored Ops. Any ideas? Have you h Google Everything takes too long, cost too much, and break too often! Executive View
  • 23. Changing job titles or adding individual skills doesn’t make systems administrators SREs.
  • 24. Changing job titles or adding individual skills doesn’t make systems administrators SREs.
  • 25. Changing job titles or adding individual skills doesn’t make systems administrators SREs. Observability Programming Skills Distributed Systems Arch. Blameless Post-Mortems
  • 26. Changing job titles or adding individual skills doesn’t make systems administrators SREs. Observability Programming Skills Distributed Systems Arch. Blameless Post-Mortems 000000000000000
  • 27. Changing job titles or adding individual skills doesn’t make systems administrators SREs. Not SRE Observability Programming Skills Distributed Systems Arch. Blameless Post-Mortems 000000000000000
  • 28. Changing job titles or adding individual skills doesn’t make systems administrators SREs.
  • 29. Changing job titles or adding individual skills doesn’t make systems administrators SREs. SRE is a rethinking of how Operations work gets done.
  • 30. Principles are what makes SRE different
  • 31. Principles are what makes SRE different Stephen Thorne, Google At DevOps Enterprise Summit London 2018 “Principles of SRE” https://youtu.be/c-w_GYvi0eA
  • 32. Principles are what makes SRE different 1. SRE needs Service Level Objectives, with consequences Stephen Thorne, Google At DevOps Enterprise Summit London 2018 “Principles of SRE” https://youtu.be/c-w_GYvi0eA
  • 33. SLO and Error Budgets: Tools for Shared Responsibility 0 100 Service Level Objective Error Budget* Service Level Indicator (*Use this to improve the service)
  • 34. SLO and Error Budgets: Tools for Shared Responsibility 0 100 Service Level Objective Error Budget* Service Level Indicator (*Use this to improve the service)
  • 35. SLO and Error Budgets: Tools for Shared Responsibility 0 100 Service Level Objective Error Budget* Service Level Indicator (*Use this to improve the service) DEV BIZ Ops
  • 36. SLO and Error Budgets: Tools for Shared Responsibility 0 100 Service Level Objective Error Budget* Service Level Indicator (*Use this to improve the service) DEV BIZ Ops SLO takes priority!!
  • 37. Principles of SRE are what set SRE apart 1. SRE needs Service Level Objectives, with consequences
  • 38. Principles of SRE are what set SRE apart 1. SRE needs Service Level Objectives, with consequences 2. SREs have time to make tomorrow better than today
  • 39. Toil: Name For a Problem We’ve All Felt
  • 40. Toil: Name For a Problem We’ve All Felt “Toil is the kind of work tied to running a production service that tends to be manual, repetitive, automatable, tactical, devoid of enduring value, and that scales linearly as a service grows.” -Vivek Rau Google
  • 41. Toil vs. Engineering Work Toil Engineering Work Lacks Enduring Value Builds Enduring Value Rote, Repetitive Creative, Iterative Tactical Strategic Increases With Scale Enables Scaling Can Be Automated Requires Human Creativity
  • 42. Excessive Toil Prevents Fixing the System Toil Engineering Work E.W.Toil Reduce toil Improve the business ǡ No capacity to reduce toil No capacity to improve business Toil at manageable percentage of capacity Toil at unmanageable percentage of capacity (“Engineering Bankruptcy”)
  • 43. Excessive Toil Prevents Fixing the System Toil Engineering Work E.W.Toil Reduce toil Improve the business ǡ No capacity to reduce toil No capacity to improve business Toil at manageable percentage of capacity Toil at unmanageable percentage of capacity (“Engineering Bankruptcy”)
  • 44. Excessive Toil Prevents Fixing the System Toil Engineering Work E.W.Toil Reduce toil Improve the business ǡ No capacity to reduce toil No capacity to improve business Toil at manageable percentage of capacity Toil at unmanageable percentage of capacity (“Engineering Bankruptcy”) Downward spiral is inevitable!
  • 45. Principles of SRE are what set SRE apart 1. SRE needs Service Level Objectives, with consequences 2. SREs have time to make tomorrow better than today
  • 46. Principles of SRE are what set SRE apart 1. SRE needs Service Level Objectives, with consequences 2. SREs have time to make tomorrow better than today 3. SRE teams have the ability to regulate their workload
  • 47. SRE teams have the ability to regulate their workload
  • 48. SRE teams have the ability to regulate their workload Example:
  • 49. SRE teams have the ability to regulate their workload Example: What if handing-off responsibility to SRE/Ops wasn’t a right?
  • 50. SRE teams have the ability to regulate their workload Example: What if handing-off responsibility to SRE/Ops wasn’t a right? (separate the “running in production” from “run by SRE/Ops”)
  • 51. SRE teams have the ability to regulate their workload Example: What if handing-off responsibility to SRE/Ops wasn’t a right? (separate the “running in production” from “run by SRE/Ops”) “?!?”
  • 52. Principles of SRE are what set SRE apart 1. SRE needs Service Level Objectives, with consequences 2. SREs have time to make tomorrow better than today 3. SRE teams have the ability to regulate their workload
  • 53. Where to start (the practical approach)
  • 54. Where to start (the practical approach) 1. SRE needs Service Level Objectives, with consequences 2. SREs have time to make tomorrow better than today 3. SRE teams have the ability to regulate their workload
  • 55. Where to start (the practical approach) 1. SRE needs Service Level Objectives, with consequences 2. SREs have time to make tomorrow better than today 3. SRE teams have the ability to regulate their workload Company-wide culture change (hard!)
  • 56. Where to start (the practical approach) 1. SRE needs Service Level Objectives, with consequences 2. SREs have time to make tomorrow better than today 3. SRE teams have the ability to regulate their workload Company-wide culture change (hard!) Company-wide culture change (hard!)
  • 57. Where to start (the practical approach) 1. SRE needs Service Level Objectives, with consequences 2. SREs have time to make tomorrow better than today 3. SRE teams have the ability to regulate their workload Company-wide culture change (hard!) Company-wide culture change (hard!) Reduce toil.
 Everybody wins!
  • 58. Where to start (the practical approach) 1. SRE needs Service Level Objectives, with consequences 2. SREs have time to make tomorrow better than today 3. SRE teams have the ability to regulate their workload Company-wide culture change (hard!) Company-wide culture change (hard!) Reduce toil.
 Everybody wins!
  • 59. Why focus on reducing toil?
  • 60. Why focus on reducing toil? 1. Lots of value independent of “SRE”
  • 61. 2. Your people are you most expensive assets
 … stay out of their way! Why focus on reducing toil? 1. Lots of value independent of “SRE”
  • 62. Start reducing toil today Toil
  • 63. Start reducing toil today 1. Track toil levels for each team Toil
  • 64. Start reducing toil today 1. Track toil levels for each team Toil
  • 65. Track toil levels for each team
  • 66. Track toil levels for each team • Standardize (e.g. meetings and email are “overhead" not “toil”)
  • 67. Track toil levels for each team • Standardize (e.g. meetings and email are “overhead" not “toil”) • Track • Self-reporting • Periodic surveys • SM or PM interview/sampling
  • 68. Track toil levels for each team • Standardize (e.g. meetings and email are “overhead" not “toil”) • Track • Self-reporting • Periodic surveys • SM or PM interview/sampling • Don’t get lost in time tracking weeds!
  • 69. Start reducing toil today 1. Track toil levels for each team Toil
  • 70. Start reducing toil today 1. Track toil levels for each team Toil 2. Set toil limit for each team (50% is conventional wisdom)
  • 71. Start reducing toil today 1. Track toil levels for each team 2. Set toil limit for each team (50% is conventional wisdom) 3. Fund efforts to reduce toil (with emphasis on teams already over limit) Toil
  • 72. Start reducing toil today 1. Track toil levels for each team 2. Set toil limit for each team (50% is conventional wisdom) 3. Fund efforts to reduce toil (with emphasis on teams already over limit) Toil Michael Kehoe Todd Palino (LinkedIn) At SREcon Americas 2019 Example Process “Code Yellow”
  • 75. Where to focus? Toil Reduce Technical Debt Re-Engineer Processes
  • 76. Where to focus? Toil Reduce Technical Debt Re-Engineer Processes Enable Self-Service
  • 77. Where to focus? Toil Reduce Technical Debt Re-Engineer Processes Enable Self-Service
  • 78.
  • 82. How to enable self-service? Empower teams to spot and fix the anti-patterns.
  • 83. “Do this for me, do it again, then do it again.” Done.I need you to do X Your other work I need you to do X I need you to do X Ticket Do X Later… Do X Do X Done. Done. Your other work Self-Service Self-Service Self-Service Your other work x2 Your other work x3 Later…Later… Later… Your other work Your other work After Before Wait Interrupt Ticket Wait Interrupt Ticket Wait Interrupt
  • 84. “Do this for me, do it again, then do it again.” Done.I need you to do X Your other work I need you to do X I need you to do X Ticket Do X Later… Do X Do X Done. Done. Your other work Self-Service Self-Service Self-Service Your other work x2 Your other work x3 Later…Later… Later… Your other work Your other work After Before Wait Interrupt Ticket Wait Interrupt Ticket Wait Interrupt
  • 85. “I could fix it, but I can’t get to it.” Environment I could fix it if I could get to it Before Wait Interrupt
  • 86. “I could fix it, but I can’t get to it.” Environment I could fix it if I could get to it Before Wait Interrupt After I’ve got this! Environment Self- Service
  • 87. “The dog-pile.” !! I think its a problem with db07-store2.uswest.acme “$ top” “$ top” db07store2. uswest.acme “$ top” “$ top” “$ top” !! “$ top” !! !! !! healthcheck store2 -all db07store2. uswest.acme Self-Service 1. 2. 3. I think its a problem with db07-store2.uswest.acme
  • 88. “I’m an expert, I don’t read the wiki.” docs Service has changed. Use this flag or bad things will happen! Pause monitoring first or we all get woken up! “restart -doit -now” I’ve done this before. I’ve got this… Environment docs Later… Before
  • 89. “I’m an expert, I don’t read the wiki.” docs Service has changed. Use this flag or bad things will happen! Pause monitoring first or we all get woken up! “restart -doit -now” I’ve done this before. I’ve got this… Environment docs Later… Before
  • 90. “I’m an expert, I don’t read the wiki.” docs Service has changed. Use this flag or bad things will happen! Pause monitoring first or we all get woken up! “restart -doit -now” I’ve done this before. I’ve got this… Environment docs Later… Before Service has changed. Use this flag or bad things will happen! Pause monitoring first or we all get woken up! “restart” Environment Later… Update Restart Job ✅ I’ve done this before. I’ve got this. Self-Service Self-Service After
  • 91. “Known issue… doesn’t get permanent fix”
  • 92. “Known issue… doesn’t get permanent fix”
  • 93. Self-Service Operations Design Pattern (in a nutshell) Consumer of Ops Capabilities Self-Service On Demand Ops Capability Specialist Knowledge Ops Capability Specialist Knowledge
  • 94. Self-Service Operations Design Pattern (in a nutshell) Pull-Based Consumer of Ops Capabilities Self-Service On Demand Ops Capability Specialist Knowledge Ops Capability Specialist Knowledge
  • 95. Self-Service Operations Design Pattern (in a nutshell) Pull-Based Accept tools/languages that teams want to use Consumer of Ops Capabilities Self-Service On Demand Ops Capability Specialist Knowledge Ops Capability Specialist Knowledge
  • 96. Self-Service Operations Design Pattern (in a nutshell) Pull-Based Accept tools/languages that teams want to use Define “guardrails” to provide work safety Consumer of Ops Capabilities Self-Service On Demand Ops Capability Specialist Knowledge Ops Capability Specialist Knowledge
  • 97. Self-Service Operations Design Pattern (in a nutshell) Pull-Based Accept tools/languages that teams want to use Let people who “push buttons” define the buttons Define “guardrails” to provide work safety Consumer of Ops Capabilities Self-Service On Demand Ops Capability Specialist Knowledge Ops Capability Specialist Knowledge
  • 98. Self-Service Operations Design Pattern (in a nutshell) Pull-Based Accept tools/languages that teams want to use Let people who “push buttons” define the buttons Build in security and compliance Define “guardrails” to provide work safety Consumer of Ops Capabilities Self-Service On Demand Ops Capability Specialist Knowledge Ops Capability Specialist Knowledge
  • 99. Self-Service is ultimately about user experience Consumer of Ops Capabilities Self-Service On Demand Ops Capability Specialist Knowledge Ops Capability Specialist Knowledge
  • 100. Self-Service is ultimately about user experience Consumer of Ops Capabilities Self-Service On Demand Ops Capability Specialist Knowledge Ops Capability Specialist Knowledge 1.Work how they want to work (GUI, API, CLI)
  • 101. Self-Service is ultimately about user experience Consumer of Ops Capabilities Self-Service On Demand Ops Capability Specialist Knowledge Ops Capability Specialist Knowledge 1.Work how they want to work (GUI, API, CLI) 2. “Guardrails” (Smart options that helpfully constrain)
  • 102. Self-Service is ultimately about user experience Consumer of Ops Capabilities Self-Service On Demand Ops Capability Specialist Knowledge Ops Capability Specialist Knowledge 1.Work how they want to work (GUI, API, CLI) 2. “Guardrails” (Smart options that helpfully constrain) 3.Dynamic resource model
 (Up-to-date details of your environment)
  • 103. Self-Service can also be a foundation for strategic initiatives
  • 104. Strategic: Improve incident response times https://youtu.be/USYrDaPEFtM Jody Mulkey at DOES ‘15 SF
  • 105. Strategic: Improve incident response times https://youtu.be/USYrDaPEFtM Jody Mulkey at DOES ‘15 SF Services Monitoring Scripts/Tools Services Monitoring Scripts/ToolsServices Monitoring Scripts/Tools DEV STAGE PROD Dev & QA NOC/Ops Dev Promote approved jobs Self-Service Self-Service Empower
  • 106. Strategic: Improve incident response times https://youtu.be/USYrDaPEFtM Jody Mulkey at DOES ‘15 SF Services Monitoring Scripts/Tools Services Monitoring Scripts/ToolsServices Monitoring Scripts/Tools DEV STAGE PROD Dev & QA NOC/Ops Dev Promote approved jobs Self-Service Self-Service Empower
  • 107. Strategic: Improve incident response times https://youtu.be/USYrDaPEFtM Jody Mulkey at DOES ‘15 SF Services Monitoring Scripts/Tools Services Monitoring Scripts/ToolsServices Monitoring Scripts/Tools DEV STAGE PROD Dev & QA NOC/Ops Dev Promote approved jobs Self-Service Self-Service Empower • Reduced MTTR by 92% • Reduced escalations by 50% • Reduced overall support costs by 55%
  • 108. Strategic: Reduce compliance burden & improve Shaun Norris at DOES ‘18 Las Vegas https://youtu.be/d5IMvK0YHTg
  • 109. Strategic: Reduce compliance burden & improve Shaun Norris at DOES ‘18 Las Vegas https://youtu.be/d5IMvK0YHTg Optimized for compliance • 86,000+ employees • 60+ countries • Highly regulated
  • 110. Strategic: Reduce compliance burden & improve Shaun Norris at DOES ‘18 Las Vegas https://youtu.be/d5IMvK0YHTg Optimized for compliance • 86,000+ employees • 60+ countries • Highly regulated LOB #1 LOB #2 LOB #3 LOB …n Services Scripts/Tools Data Center Services Scripts/Tools Data Center Services Scripts/Tools Data Center Services Scripts/Tools Cloud Services Scripts/Tools Cloud Services Scripts/Tools Cloud Services Scripts/Tools Cloud Self-Service ComplianceConsistency
  • 111. Strategic: Reduce compliance burden & improve Shaun Norris at DOES ‘18 Las Vegas https://youtu.be/d5IMvK0YHTg Optimized for compliance • 86,000+ employees • 60+ countries • Highly regulated LOB #1 LOB #2 LOB #3 LOB …n Services Scripts/Tools Data Center Services Scripts/Tools Data Center Services Scripts/Tools Data Center Services Scripts/Tools Cloud Services Scripts/Tools Cloud Services Scripts/Tools Cloud Services Scripts/Tools Cloud Self-Service ComplianceConsistency 12 months: • Saved 28 person years of time • 13,000+ ops tasks in privileged environments that didn’t require a review • ~200 less customer impacting events
  • 112. Recap: Make Tomorrow Better Than Today SRE is more than a title Be practical and start focusing on toil Find and fix toil anti-patterns Error Budgets and Toil Limits Apply Self-Service Operations design pattern Toil Engineering Work E.W.Toil Reduce toil Improve the business ǡ No capacity to reduce toil No capacity to improve business Toil at manageable percentage of capacity Toil at unmanageable percentage of capacity (“Engineering Bankruptcy”) SRE is a new way to think about Ops work ITIL Book 1 ITIL Book 2 ITIL Book 3 ITIL Book 4 ITIL Book 5 Quality! is job #1 Sys Admin CAB CALENDAR Your new title is SRE. Now write code and be better at ops. PROVISIONING PROCESS Dilbert characters © Scott Adams www.dilbert.com 1. SRE needs Service Level Objectives, with consequences 2. SREs have time to make tomorrow better than today 3. SRE teams have the ability to regulate their workload 0 100 Service Level Objective Error Budget* Service Level Indicator (*Use this to improve the service) Done.I need you to do X Your other work I need you to do X I need you to do X Ticket Do X Later… Do X Do X Done. Done. Your other work Self-Service Self-Service Self-Service Your other work x2 Your other work x3 Later…Later… Later… Your other work Your other work After Before Wait Interrupt Ticket Wait Interrupt Ticket Wait Interrupt Consumer of Ops Capabilities Self-Service Operation On Demand Ops Capability Specialist Knowledge Ops Capability Specialist Knowledge Toil