SlideShare a Scribd company logo
1 of 31
Download to read offline
STAGES OF PRACTICESTAGES OF PRACTICE
ININ
SITE RELIABILITYSITE RELIABILITY
ENGINEERINGENGINEERING
Stages of PracticeStages of Practice
Shu (obey/basics) 守
Ha (detach/ready to learn) 破
Ri (separate/intuitive) 離
Stages of PracticeStages of Practice
Innocent
Shu (obey) Novice
Beginner
Competent
Ha (detach) Pro cient
Ri (separate) Master  
Expert/Researcher
Signposts of SRE PracticeSignposts of SRE Practice
Incident Response
Postmortems
Incident Prevention
Service Level Handling:
Indicators, Objectives, Agreements (IOAs)
Monitoring
Capacity Planning and Forecasting
Performance Management
Signposts:Signposts:
Incident ResponseIncident Response
 
(hat tip to J. Paul Reed)
Shu Signposts:Shu Signposts:
Incident ResponseIncident Response
Novice “Alarmed” by incidents
Primarily external sourced with inconsistent
response
Beginner “Fears” incidents
E ective response requires speci c people
Competent “Aware” that incidents are normal
Well de ned handling process
Ha-Ri Signposts:Ha-Ri Signposts:
Incident ResponseIncident Response
Pro cient “Accept” incidents as a normal
Some inter-team coordination planning
Master “Embrace” incidents as learning experiences
Well documented processes and procedures with
learning inputs to the process
Signposts:Signposts:
PostmortemsPostmortems
Shu Signposts:Shu Signposts:
PostmortemsPostmortems
Novice “Blameful”, only for crisis incidents
Looking for a scapegoat
Beginner Only performed for major incidents
Looking for a cause with a focus on mistakes
Competent More common, starting to look past blaming
Focus on improving local processes
Ha-Ri Signposts:Ha-Ri Signposts:
PostmortemsPostmortems
Pro cient “Blameless”, used consistently
Action items feed back to improve systems &
processes
Master Used to derive “meta”-learnings
Applying learnings across the system
Signposts:Signposts:
Incident PreventionIncident Prevention
 
(hat tip to J. Paul Reed)
Shu Signposts:Shu Signposts:
Incident PreventionIncident Prevention
Novice Focus on remediation (docs & metrics) for
manually-identi ed, static, contributory causes
Beginner Documentation done to an “acceptable” level
Static & action-based causes recognized
Competent Focus on team response to incidents, maintaining
docs
Ha-Ri Signposts:Ha-Ri Signposts:
Incident PreventionIncident Prevention
Pro cient Early phases of chaos engineering - scheduled
Limited randomized chaos engineering
Master Chaos engineering as a tool to manage to an SLO
Focus on general hygiene of operational
environment
Signposts:Signposts:
Service Level HandlingService Level Handling
SLIs / SLOs / SLAsSLIs / SLOs / SLAs
Shu Signposts:Shu Signposts:
SL[IOA]sSL[IOA]s
Novice Externally imposed (SLA), if any
On paper, not necessarily measured
May be manually calculated for contractual needs
Beginner Recognizes the di erence in these terms
Measures “easy” things
Competent De ned and measured primary characteristics
Measures internal SLOs (80+%), not just
contractual performance
Ha-Ri Signposts:Ha-Ri Signposts:
SL[IOA]sSL[IOA]s
Pro cient Well developed cascade of measures
Historical record and correlation to events
Master Meaningful measures throughout the system
Signposts:Signposts:
MonitoringMonitoring
Shu Signposts:Shu Signposts:
MonitoringMonitoring
Novice No baseline metrics established
Beginner “OS level” or “out of the box”, inconsistent
monitoring
Partial baselines being developed
Competent Consistent baseline monitoring across entire
system
Able to determine statistical anomalies
Ha-Ri Signposts:Ha-Ri Signposts:
MonitoringMonitoring
Pro cient Thorough instrumentation of all service components
Able to correlate internal and external measures
Master Data observable upon demand
Automated correlation and anomaly detection
Signposts:Signposts:
Capacity PlanningCapacity Planning
and Forecastingand Forecasting
Shu Signposts:Shu Signposts:
Capacity Planning and ForecastingCapacity Planning and Forecasting
Novice Frequently running out of resources
Throw hardware at the problem
Beginner Reactive, manual and/or time-consuming
Some metrics, but incomplete
Competent Able to identify and react to danger situations
before crisis
Coverage of the most critical 20-50%
Ha-Ri Signposts:Ha-Ri Signposts:
Capacity Planning and ForecastingCapacity Planning and Forecasting
Pro cient Nearly complete coverage
Able to reliably predict capacity in the short term (1+
purchase cycles)
Established methods for new service/feature
handling
Master Able to reliably predict capacity 3-4 purchase cycles
out
Signposts:Signposts:
Performance ManagementPerformance Management
Shu Signposts:Shu Signposts:
Performance ManagementPerformance Management
Novice “The site is up, isn’t that good enough?”
Beginner Selective monitoring
Competent Comprehensive monitoring, but still reactive on
regressions
Ha-Ri Signposts:Ha-Ri Signposts:
Capacity Planning and ForecastingCapacity Planning and Forecasting
Pro cient Able to prevent regressions through pre- release
analysis/modeling/validation
Initial “large market” segmentation
Master Finely granular performance monitoring and
management
Assessing Your Organization’sAssessing Your Organization’s
Level of PracticeLevel of Practice
Mock Assessment Search
Monitoring
Novice
Competent
Pro..
BeginnerBeg..
I-Response
Beginner
SLx
Novice
Beginner
I-Prevent PM
Com..
NoviceNovice
Other Potential Areas to EvaluateOther Potential Areas to Evaluate
Observability Tooling & Capabilities
Error Budget De nition and Usage
Continuous Budget Tracking (vs. binary Bang-bang handling)
Change Management Practices
Full Service Lifecycle Reliability:
Do Your Services “Plan for Retirement”?
Even More Potential Areas toEven More Potential Areas to
EvaluateEvaluate
New Services: Intro to Stability Arc
Toil Fraction
Oncall Sustainability
Ryan Franz (Etsy): Mean Time to Sleep (MTTS)
EachEach ‘9’‘9’ will cost you morewill cost you more
than the one before it…than the one before it…
 
Org-wide Practice Adoption ?Org-wide Practice Adoption ?
Practices Under Pressure?Practices Under Pressure?
WillWill ChangeChange Ever Stop?Ever Stop?
The root cause for both the functioning and malfunctions in all
complex systems is impermanence (…all systems are changeable by
nature). Knowing the root cause, we no longer seek it, and instead
look for the many conditions that allowed a particular situation to
manifest. We accept that not all conditions are knowable or xable.
– Dave Zwieback Beyond Blame
. . . and that’s a. . . and that’s a goodgood thingthing
. . . the real story . . . goes on forever . . .
every chapter is better than the one before.
Continuing the conversation. . .Continuing the conversation. . .
Twitter: @DrKurtATwitter: @DrKurtA
https://linkedin.com/in/kurta1

More Related Content

Similar to Assessing stages of practice

Security Audit Best-Practices
Security Audit Best-PracticesSecurity Audit Best-Practices
Security Audit Best-PracticesMarco Raposo
 
Session W1 - Reliable Risk Quantification For Project Cost and Schedule
Session W1 - Reliable Risk Quantification For Project Cost and ScheduleSession W1 - Reliable Risk Quantification For Project Cost and Schedule
Session W1 - Reliable Risk Quantification For Project Cost and ScheduleProject Controls Expo
 
Iseb, ISTQB Static Testing
Iseb, ISTQB Static TestingIseb, ISTQB Static Testing
Iseb, ISTQB Static Testingonsoftwaretest
 
Quality tools-asq-london-may-10-2006
Quality tools-asq-london-may-10-2006Quality tools-asq-london-may-10-2006
Quality tools-asq-london-may-10-2006Omnex Inc.
 
ISTQB / ISEB Foundation Exam Practice
ISTQB / ISEB Foundation Exam PracticeISTQB / ISEB Foundation Exam Practice
ISTQB / ISEB Foundation Exam PracticeYogindernath Gupta
 
Risk Assessment Framework
Risk Assessment FrameworkRisk Assessment Framework
Risk Assessment FrameworkJhurt7103
 
Incident Response and SAP Systems
Incident Response and SAP SystemsIncident Response and SAP Systems
Incident Response and SAP SystemsOnapsis Inc.
 
My Vision
My VisionMy Vision
My Visionmelynch
 
Optimizing Your SOA with Event Processing
Optimizing Your SOA with Event ProcessingOptimizing Your SOA with Event Processing
Optimizing Your SOA with Event ProcessingTim Bass
 
Risk Based Monitoring presentation by Triumph Research Intelligence January 2014
Risk Based Monitoring presentation by Triumph Research Intelligence January 2014Risk Based Monitoring presentation by Triumph Research Intelligence January 2014
Risk Based Monitoring presentation by Triumph Research Intelligence January 2014Triumph Consultancy Services
 
2008 Ergonomics Society - Staffing assessments and supervision
2008 Ergonomics Society - Staffing assessments and supervision2008 Ergonomics Society - Staffing assessments and supervision
2008 Ergonomics Society - Staffing assessments and supervisionAndy Brazier
 
PSD OpRisk Forum presentation 2016
PSD OpRisk Forum presentation 2016PSD OpRisk Forum presentation 2016
PSD OpRisk Forum presentation 2016PSD Group Ltd
 
Lean Six Sigma Course Training Part 16
Lean Six Sigma Course Training Part 16Lean Six Sigma Course Training Part 16
Lean Six Sigma Course Training Part 16Lean Insight
 

Similar to Assessing stages of practice (20)

Hazop_study_introduction
Hazop_study_introductionHazop_study_introduction
Hazop_study_introduction
 
Security Audit Best-Practices
Security Audit Best-PracticesSecurity Audit Best-Practices
Security Audit Best-Practices
 
Session W1 - Reliable Risk Quantification For Project Cost and Schedule
Session W1 - Reliable Risk Quantification For Project Cost and ScheduleSession W1 - Reliable Risk Quantification For Project Cost and Schedule
Session W1 - Reliable Risk Quantification For Project Cost and Schedule
 
Iseb, ISTQB Static Testing
Iseb, ISTQB Static TestingIseb, ISTQB Static Testing
Iseb, ISTQB Static Testing
 
Quality tools-asq-london-may-10-2006
Quality tools-asq-london-may-10-2006Quality tools-asq-london-may-10-2006
Quality tools-asq-london-may-10-2006
 
ISTQB / ISEB Foundation Exam Practice
ISTQB / ISEB Foundation Exam PracticeISTQB / ISEB Foundation Exam Practice
ISTQB / ISEB Foundation Exam Practice
 
Risk Assessment Framework
Risk Assessment FrameworkRisk Assessment Framework
Risk Assessment Framework
 
3A - Turning Data into Decisions - Implementing a Cloud-based HSE Leading Ind...
3A - Turning Data into Decisions - Implementing a Cloud-based HSE Leading Ind...3A - Turning Data into Decisions - Implementing a Cloud-based HSE Leading Ind...
3A - Turning Data into Decisions - Implementing a Cloud-based HSE Leading Ind...
 
Incident Response and SAP Systems
Incident Response and SAP SystemsIncident Response and SAP Systems
Incident Response and SAP Systems
 
Internal audit
Internal auditInternal audit
Internal audit
 
Root cause analysis
Root cause analysisRoot cause analysis
Root cause analysis
 
Make_a_PM_Resolution_for_2007
Make_a_PM_Resolution_for_2007Make_a_PM_Resolution_for_2007
Make_a_PM_Resolution_for_2007
 
Quality tools
Quality toolsQuality tools
Quality tools
 
Test Framework V0.1
Test Framework V0.1Test Framework V0.1
Test Framework V0.1
 
My Vision
My VisionMy Vision
My Vision
 
Optimizing Your SOA with Event Processing
Optimizing Your SOA with Event ProcessingOptimizing Your SOA with Event Processing
Optimizing Your SOA with Event Processing
 
Risk Based Monitoring presentation by Triumph Research Intelligence January 2014
Risk Based Monitoring presentation by Triumph Research Intelligence January 2014Risk Based Monitoring presentation by Triumph Research Intelligence January 2014
Risk Based Monitoring presentation by Triumph Research Intelligence January 2014
 
2008 Ergonomics Society - Staffing assessments and supervision
2008 Ergonomics Society - Staffing assessments and supervision2008 Ergonomics Society - Staffing assessments and supervision
2008 Ergonomics Society - Staffing assessments and supervision
 
PSD OpRisk Forum presentation 2016
PSD OpRisk Forum presentation 2016PSD OpRisk Forum presentation 2016
PSD OpRisk Forum presentation 2016
 
Lean Six Sigma Course Training Part 16
Lean Six Sigma Course Training Part 16Lean Six Sigma Course Training Part 16
Lean Six Sigma Course Training Part 16
 

More from Kurt Andersen

Collective Mindfulness for Better Decision Making
Collective Mindfulness for Better Decision MakingCollective Mindfulness for Better Decision Making
Collective Mindfulness for Better Decision MakingKurt Andersen
 
How bad is your toil? Measuring the Human Impact of Process
How bad is your toil? Measuring the Human Impact of ProcessHow bad is your toil? Measuring the Human Impact of Process
How bad is your toil? Measuring the Human Impact of ProcessKurt Andersen
 
Facilitating DevOps Execution in an All Digital Environment
Facilitating DevOps Execution in an All Digital EnvironmentFacilitating DevOps Execution in an All Digital Environment
Facilitating DevOps Execution in an All Digital EnvironmentKurt Andersen
 
Lessons from Iraq - Building & Running SRE Teams
Lessons from Iraq - Building & Running SRE TeamsLessons from Iraq - Building & Running SRE Teams
Lessons from Iraq - Building & Running SRE TeamsKurt Andersen
 
What You Need to Know About Email Authentication
What You Need to Know About Email AuthenticationWhat You Need to Know About Email Authentication
What You Need to Know About Email AuthenticationKurt Andersen
 
Weeping Angels of Site Reliability
Weeping Angels of Site ReliabilityWeeping Angels of Site Reliability
Weeping Angels of Site ReliabilityKurt Andersen
 
Join us at #SREcon15
Join us at #SREcon15Join us at #SREcon15
Join us at #SREcon15Kurt Andersen
 
Fighting Email Abuse with DMARC
Fighting Email Abuse with DMARCFighting Email Abuse with DMARC
Fighting Email Abuse with DMARCKurt Andersen
 
Operational Costs of Technical Debt
Operational Costs of Technical DebtOperational Costs of Technical Debt
Operational Costs of Technical DebtKurt Andersen
 

More from Kurt Andersen (9)

Collective Mindfulness for Better Decision Making
Collective Mindfulness for Better Decision MakingCollective Mindfulness for Better Decision Making
Collective Mindfulness for Better Decision Making
 
How bad is your toil? Measuring the Human Impact of Process
How bad is your toil? Measuring the Human Impact of ProcessHow bad is your toil? Measuring the Human Impact of Process
How bad is your toil? Measuring the Human Impact of Process
 
Facilitating DevOps Execution in an All Digital Environment
Facilitating DevOps Execution in an All Digital EnvironmentFacilitating DevOps Execution in an All Digital Environment
Facilitating DevOps Execution in an All Digital Environment
 
Lessons from Iraq - Building & Running SRE Teams
Lessons from Iraq - Building & Running SRE TeamsLessons from Iraq - Building & Running SRE Teams
Lessons from Iraq - Building & Running SRE Teams
 
What You Need to Know About Email Authentication
What You Need to Know About Email AuthenticationWhat You Need to Know About Email Authentication
What You Need to Know About Email Authentication
 
Weeping Angels of Site Reliability
Weeping Angels of Site ReliabilityWeeping Angels of Site Reliability
Weeping Angels of Site Reliability
 
Join us at #SREcon15
Join us at #SREcon15Join us at #SREcon15
Join us at #SREcon15
 
Fighting Email Abuse with DMARC
Fighting Email Abuse with DMARCFighting Email Abuse with DMARC
Fighting Email Abuse with DMARC
 
Operational Costs of Technical Debt
Operational Costs of Technical DebtOperational Costs of Technical Debt
Operational Costs of Technical Debt
 

Recently uploaded

WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 

Recently uploaded (20)

E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 

Assessing stages of practice

  • 1. STAGES OF PRACTICESTAGES OF PRACTICE ININ SITE RELIABILITYSITE RELIABILITY ENGINEERINGENGINEERING
  • 2. Stages of PracticeStages of Practice Shu (obey/basics) 守 Ha (detach/ready to learn) 破 Ri (separate/intuitive) 離
  • 3. Stages of PracticeStages of Practice Innocent Shu (obey) Novice Beginner Competent Ha (detach) Pro cient Ri (separate) Master   Expert/Researcher
  • 4. Signposts of SRE PracticeSignposts of SRE Practice Incident Response Postmortems Incident Prevention Service Level Handling: Indicators, Objectives, Agreements (IOAs) Monitoring Capacity Planning and Forecasting Performance Management
  • 6. Shu Signposts:Shu Signposts: Incident ResponseIncident Response Novice “Alarmed” by incidents Primarily external sourced with inconsistent response Beginner “Fears” incidents E ective response requires speci c people Competent “Aware” that incidents are normal Well de ned handling process
  • 7. Ha-Ri Signposts:Ha-Ri Signposts: Incident ResponseIncident Response Pro cient “Accept” incidents as a normal Some inter-team coordination planning Master “Embrace” incidents as learning experiences Well documented processes and procedures with learning inputs to the process
  • 9. Shu Signposts:Shu Signposts: PostmortemsPostmortems Novice “Blameful”, only for crisis incidents Looking for a scapegoat Beginner Only performed for major incidents Looking for a cause with a focus on mistakes Competent More common, starting to look past blaming Focus on improving local processes
  • 10. Ha-Ri Signposts:Ha-Ri Signposts: PostmortemsPostmortems Pro cient “Blameless”, used consistently Action items feed back to improve systems & processes Master Used to derive “meta”-learnings Applying learnings across the system
  • 12. Shu Signposts:Shu Signposts: Incident PreventionIncident Prevention Novice Focus on remediation (docs & metrics) for manually-identi ed, static, contributory causes Beginner Documentation done to an “acceptable” level Static & action-based causes recognized Competent Focus on team response to incidents, maintaining docs
  • 13. Ha-Ri Signposts:Ha-Ri Signposts: Incident PreventionIncident Prevention Pro cient Early phases of chaos engineering - scheduled Limited randomized chaos engineering Master Chaos engineering as a tool to manage to an SLO Focus on general hygiene of operational environment
  • 14. Signposts:Signposts: Service Level HandlingService Level Handling SLIs / SLOs / SLAsSLIs / SLOs / SLAs
  • 15. Shu Signposts:Shu Signposts: SL[IOA]sSL[IOA]s Novice Externally imposed (SLA), if any On paper, not necessarily measured May be manually calculated for contractual needs Beginner Recognizes the di erence in these terms Measures “easy” things Competent De ned and measured primary characteristics Measures internal SLOs (80+%), not just contractual performance
  • 16. Ha-Ri Signposts:Ha-Ri Signposts: SL[IOA]sSL[IOA]s Pro cient Well developed cascade of measures Historical record and correlation to events Master Meaningful measures throughout the system
  • 18. Shu Signposts:Shu Signposts: MonitoringMonitoring Novice No baseline metrics established Beginner “OS level” or “out of the box”, inconsistent monitoring Partial baselines being developed Competent Consistent baseline monitoring across entire system Able to determine statistical anomalies
  • 19. Ha-Ri Signposts:Ha-Ri Signposts: MonitoringMonitoring Pro cient Thorough instrumentation of all service components Able to correlate internal and external measures Master Data observable upon demand Automated correlation and anomaly detection
  • 21. Shu Signposts:Shu Signposts: Capacity Planning and ForecastingCapacity Planning and Forecasting Novice Frequently running out of resources Throw hardware at the problem Beginner Reactive, manual and/or time-consuming Some metrics, but incomplete Competent Able to identify and react to danger situations before crisis Coverage of the most critical 20-50%
  • 22. Ha-Ri Signposts:Ha-Ri Signposts: Capacity Planning and ForecastingCapacity Planning and Forecasting Pro cient Nearly complete coverage Able to reliably predict capacity in the short term (1+ purchase cycles) Established methods for new service/feature handling Master Able to reliably predict capacity 3-4 purchase cycles out
  • 24. Shu Signposts:Shu Signposts: Performance ManagementPerformance Management Novice “The site is up, isn’t that good enough?” Beginner Selective monitoring Competent Comprehensive monitoring, but still reactive on regressions
  • 25. Ha-Ri Signposts:Ha-Ri Signposts: Capacity Planning and ForecastingCapacity Planning and Forecasting Pro cient Able to prevent regressions through pre- release analysis/modeling/validation Initial “large market” segmentation Master Finely granular performance monitoring and management
  • 26. Assessing Your Organization’sAssessing Your Organization’s Level of PracticeLevel of Practice Mock Assessment Search Monitoring Novice Competent Pro.. BeginnerBeg.. I-Response Beginner SLx Novice Beginner I-Prevent PM Com.. NoviceNovice
  • 27. Other Potential Areas to EvaluateOther Potential Areas to Evaluate Observability Tooling & Capabilities Error Budget De nition and Usage Continuous Budget Tracking (vs. binary Bang-bang handling) Change Management Practices Full Service Lifecycle Reliability: Do Your Services “Plan for Retirement”?
  • 28. Even More Potential Areas toEven More Potential Areas to EvaluateEvaluate New Services: Intro to Stability Arc Toil Fraction Oncall Sustainability Ryan Franz (Etsy): Mean Time to Sleep (MTTS)
  • 29. EachEach ‘9’‘9’ will cost you morewill cost you more than the one before it…than the one before it…   Org-wide Practice Adoption ?Org-wide Practice Adoption ? Practices Under Pressure?Practices Under Pressure?
  • 30. WillWill ChangeChange Ever Stop?Ever Stop? The root cause for both the functioning and malfunctions in all complex systems is impermanence (…all systems are changeable by nature). Knowing the root cause, we no longer seek it, and instead look for the many conditions that allowed a particular situation to manifest. We accept that not all conditions are knowable or xable. – Dave Zwieback Beyond Blame . . . and that’s a. . . and that’s a goodgood thingthing
  • 31. . . . the real story . . . goes on forever . . . every chapter is better than the one before. Continuing the conversation. . .Continuing the conversation. . . Twitter: @DrKurtATwitter: @DrKurtA https://linkedin.com/in/kurta1