SlideShare a Scribd company logo
1 of 28
Handling incidents collaboratively is like
solving a rubik’s cube
More complex Communication
Worker 1
POD 2
POD 1
Worker 2 POD 2
Kubernetes
Master
API
Server
@nele_lea
@nlea@social.anoxion.de
@nlea
Resolve
Understanding Causality
Random
fact
1. Understanding 2. Fixing
- Retry
- Restart
- Bringing back on old version
Defining a Workflow
Prevent
Photo by Scott Sanker on Unsplash
Retry Strategy
Documentation
Best Practise
Discover
Make your System Observable
Telemetry Data
- Logs
- Metrics
- Traces
Observability
Application
Instrument Query-,
alerting-,
visualization
Platform
Observibility
data backend
OpenTelemetry
- CNCF project
- Vendor neutral
- Merged OpenTracing and OpenCencus
- Specifications, protocols, API’s and SDKs
- No telemetry data backend!
But what to instrument?
The meaning of SLOs?
How to write queries, if I like to
understand the metrics that I am
collecting?
Questions raising from the Application
Developers
Auto Instrumentation?
Function Level Metrics
https://github.com/autometrics-dev
Autometrics
Instrument
Metrics BackEnd
latency, error- and request
rate
Application
Functions
Query-,
alerting-,
visualization
Platform
Demo
Kubetrain.io
berlin@kubetrain.io
🚂
@nele_lea
@nlea@social.anoxion.de
@nlea

More Related Content

Similar to Handling Incidents collaboratively is like solving a Rubik's cube.pptx

What is going on - Application diagnostics on Azure - TechDays Finland
What is going on - Application diagnostics on Azure - TechDays FinlandWhat is going on - Application diagnostics on Azure - TechDays Finland
What is going on - Application diagnostics on Azure - TechDays FinlandMaarten Balliauw
 
Elastic Morocco Meetup Nov 2020
Elastic Morocco Meetup Nov 2020Elastic Morocco Meetup Nov 2020
Elastic Morocco Meetup Nov 2020Anna Ossowski
 
Integris Security - Hacking With Glue ℠
Integris Security - Hacking With Glue ℠Integris Security - Hacking With Glue ℠
Integris Security - Hacking With Glue ℠Integris Security LLC
 
Intelligent Monitoring
Intelligent MonitoringIntelligent Monitoring
Intelligent MonitoringIntelie
 
Building Reactive Real-time Data Pipeline
Building Reactive Real-time Data PipelineBuilding Reactive Real-time Data Pipeline
Building Reactive Real-time Data PipelineTrieu Nguyen
 
Vulnerability Discovery in the Cloud
Vulnerability Discovery in the CloudVulnerability Discovery in the Cloud
Vulnerability Discovery in the CloudDevOps.com
 
PreMonR - A Reactive Platform To Monitor Reactive Application
PreMonR - A Reactive Platform To Monitor Reactive ApplicationPreMonR - A Reactive Platform To Monitor Reactive Application
PreMonR - A Reactive Platform To Monitor Reactive ApplicationKnoldus Inc.
 
An Introduction to Prometheus (GrafanaCon 2016)
An Introduction to Prometheus (GrafanaCon 2016)An Introduction to Prometheus (GrafanaCon 2016)
An Introduction to Prometheus (GrafanaCon 2016)Brian Brazil
 
Monitoring What Matters: The Prometheus Approach to Whitebox Monitoring (Berl...
Monitoring What Matters: The Prometheus Approach to Whitebox Monitoring (Berl...Monitoring What Matters: The Prometheus Approach to Whitebox Monitoring (Berl...
Monitoring What Matters: The Prometheus Approach to Whitebox Monitoring (Berl...Brian Brazil
 
How to Monitor Microservices
How to Monitor MicroservicesHow to Monitor Microservices
How to Monitor MicroservicesSysdig
 
Distributed Tracing Velocity2016
Distributed Tracing Velocity2016Distributed Tracing Velocity2016
Distributed Tracing Velocity2016Reshmi Krishna
 
Adam_Mcconnell_SPR11_v3
Adam_Mcconnell_SPR11_v3Adam_Mcconnell_SPR11_v3
Adam_Mcconnell_SPR11_v3Adam McConnell
 
Applying AI to Performance Engineering: Shift-Left, Shift-Right, Self-Healing
Applying AI to Performance Engineering: Shift-Left, Shift-Right, Self-HealingApplying AI to Performance Engineering: Shift-Left, Shift-Right, Self-Healing
Applying AI to Performance Engineering: Shift-Left, Shift-Right, Self-HealingAndreas Grabner
 
Why AIOps Matters For Kubernetes
Why AIOps Matters For KubernetesWhy AIOps Matters For Kubernetes
Why AIOps Matters For KubernetesTimothy Chen
 
Myriam phd
Myriam phdMyriam phd
Myriam phdiammyr
 
From Duke of DevOps to Queen of Chaos - Api days 2018
From Duke of DevOps to Queen of Chaos - Api days 2018From Duke of DevOps to Queen of Chaos - Api days 2018
From Duke of DevOps to Queen of Chaos - Api days 2018Christophe Rochefolle
 
Spring Boot & Spring Cloud Apps on Pivotal Application Service - Daniel Lavoie
Spring Boot & Spring Cloud Apps on Pivotal Application Service - Daniel LavoieSpring Boot & Spring Cloud Apps on Pivotal Application Service - Daniel Lavoie
Spring Boot & Spring Cloud Apps on Pivotal Application Service - Daniel LavoieVMware Tanzu
 
SpringOne Tour Denver - Spring Boot & Spring Cloud on Pivotal Application Ser...
SpringOne Tour Denver - Spring Boot & Spring Cloud on Pivotal Application Ser...SpringOne Tour Denver - Spring Boot & Spring Cloud on Pivotal Application Ser...
SpringOne Tour Denver - Spring Boot & Spring Cloud on Pivotal Application Ser...VMware Tanzu
 

Similar to Handling Incidents collaboratively is like solving a Rubik's cube.pptx (20)

What is going on - Application diagnostics on Azure - TechDays Finland
What is going on - Application diagnostics on Azure - TechDays FinlandWhat is going on - Application diagnostics on Azure - TechDays Finland
What is going on - Application diagnostics on Azure - TechDays Finland
 
Elastic Morocco Meetup Nov 2020
Elastic Morocco Meetup Nov 2020Elastic Morocco Meetup Nov 2020
Elastic Morocco Meetup Nov 2020
 
Integris Security - Hacking With Glue ℠
Integris Security - Hacking With Glue ℠Integris Security - Hacking With Glue ℠
Integris Security - Hacking With Glue ℠
 
Intelligent Monitoring
Intelligent MonitoringIntelligent Monitoring
Intelligent Monitoring
 
Building Reactive Real-time Data Pipeline
Building Reactive Real-time Data PipelineBuilding Reactive Real-time Data Pipeline
Building Reactive Real-time Data Pipeline
 
Vulnerability Discovery in the Cloud
Vulnerability Discovery in the CloudVulnerability Discovery in the Cloud
Vulnerability Discovery in the Cloud
 
PreMonR - A Reactive Platform To Monitor Reactive Application
PreMonR - A Reactive Platform To Monitor Reactive ApplicationPreMonR - A Reactive Platform To Monitor Reactive Application
PreMonR - A Reactive Platform To Monitor Reactive Application
 
Final paper
Final paperFinal paper
Final paper
 
An Introduction to Prometheus (GrafanaCon 2016)
An Introduction to Prometheus (GrafanaCon 2016)An Introduction to Prometheus (GrafanaCon 2016)
An Introduction to Prometheus (GrafanaCon 2016)
 
Monitoring What Matters: The Prometheus Approach to Whitebox Monitoring (Berl...
Monitoring What Matters: The Prometheus Approach to Whitebox Monitoring (Berl...Monitoring What Matters: The Prometheus Approach to Whitebox Monitoring (Berl...
Monitoring What Matters: The Prometheus Approach to Whitebox Monitoring (Berl...
 
How to Monitor Microservices
How to Monitor MicroservicesHow to Monitor Microservices
How to Monitor Microservices
 
Sais svcc
Sais svccSais svcc
Sais svcc
 
Distributed Tracing Velocity2016
Distributed Tracing Velocity2016Distributed Tracing Velocity2016
Distributed Tracing Velocity2016
 
Adam_Mcconnell_SPR11_v3
Adam_Mcconnell_SPR11_v3Adam_Mcconnell_SPR11_v3
Adam_Mcconnell_SPR11_v3
 
Applying AI to Performance Engineering: Shift-Left, Shift-Right, Self-Healing
Applying AI to Performance Engineering: Shift-Left, Shift-Right, Self-HealingApplying AI to Performance Engineering: Shift-Left, Shift-Right, Self-Healing
Applying AI to Performance Engineering: Shift-Left, Shift-Right, Self-Healing
 
Why AIOps Matters For Kubernetes
Why AIOps Matters For KubernetesWhy AIOps Matters For Kubernetes
Why AIOps Matters For Kubernetes
 
Myriam phd
Myriam phdMyriam phd
Myriam phd
 
From Duke of DevOps to Queen of Chaos - Api days 2018
From Duke of DevOps to Queen of Chaos - Api days 2018From Duke of DevOps to Queen of Chaos - Api days 2018
From Duke of DevOps to Queen of Chaos - Api days 2018
 
Spring Boot & Spring Cloud Apps on Pivotal Application Service - Daniel Lavoie
Spring Boot & Spring Cloud Apps on Pivotal Application Service - Daniel LavoieSpring Boot & Spring Cloud Apps on Pivotal Application Service - Daniel Lavoie
Spring Boot & Spring Cloud Apps on Pivotal Application Service - Daniel Lavoie
 
SpringOne Tour Denver - Spring Boot & Spring Cloud on Pivotal Application Ser...
SpringOne Tour Denver - Spring Boot & Spring Cloud on Pivotal Application Ser...SpringOne Tour Denver - Spring Boot & Spring Cloud on Pivotal Application Ser...
SpringOne Tour Denver - Spring Boot & Spring Cloud on Pivotal Application Ser...
 

Recently uploaded

Indian Dairy Industry Present Status and.ppt
Indian Dairy Industry Present Status and.pptIndian Dairy Industry Present Status and.ppt
Indian Dairy Industry Present Status and.pptMadan Karki
 
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor CatchersTechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catcherssdickerson1
 
Mine Environment II Lab_MI10448MI__________.pptx
Mine Environment II Lab_MI10448MI__________.pptxMine Environment II Lab_MI10448MI__________.pptx
Mine Environment II Lab_MI10448MI__________.pptxRomil Mishra
 
Comprehensive energy systems.pdf Comprehensive energy systems.pdf
Comprehensive energy systems.pdf Comprehensive energy systems.pdfComprehensive energy systems.pdf Comprehensive energy systems.pdf
Comprehensive energy systems.pdf Comprehensive energy systems.pdfalene1
 
Input Output Management in Operating System
Input Output Management in Operating SystemInput Output Management in Operating System
Input Output Management in Operating SystemRashmi Bhat
 
"Exploring the Essential Functions and Design Considerations of Spillways in ...
"Exploring the Essential Functions and Design Considerations of Spillways in ..."Exploring the Essential Functions and Design Considerations of Spillways in ...
"Exploring the Essential Functions and Design Considerations of Spillways in ...Erbil Polytechnic University
 
Novel 3D-Printed Soft Linear and Bending Actuators
Novel 3D-Printed Soft Linear and Bending ActuatorsNovel 3D-Printed Soft Linear and Bending Actuators
Novel 3D-Printed Soft Linear and Bending ActuatorsResearcher Researcher
 
Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...
Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...
Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...Erbil Polytechnic University
 
High Voltage Engineering- OVER VOLTAGES IN ELECTRICAL POWER SYSTEMS
High Voltage Engineering- OVER VOLTAGES IN ELECTRICAL POWER SYSTEMSHigh Voltage Engineering- OVER VOLTAGES IN ELECTRICAL POWER SYSTEMS
High Voltage Engineering- OVER VOLTAGES IN ELECTRICAL POWER SYSTEMSsandhya757531
 
Crushers to screens in aggregate production
Crushers to screens in aggregate productionCrushers to screens in aggregate production
Crushers to screens in aggregate productionChinnuNinan
 
Correctly Loading Incremental Data at Scale
Correctly Loading Incremental Data at ScaleCorrectly Loading Incremental Data at Scale
Correctly Loading Incremental Data at ScaleAlluxio, Inc.
 
National Level Hackathon Participation Certificate.pdf
National Level Hackathon Participation Certificate.pdfNational Level Hackathon Participation Certificate.pdf
National Level Hackathon Participation Certificate.pdfRajuKanojiya4
 
Engineering Drawing section of solid
Engineering Drawing     section of solidEngineering Drawing     section of solid
Engineering Drawing section of solidnamansinghjarodiya
 
Immutable Image-Based Operating Systems - EW2024.pdf
Immutable Image-Based Operating Systems - EW2024.pdfImmutable Image-Based Operating Systems - EW2024.pdf
Immutable Image-Based Operating Systems - EW2024.pdfDrew Moseley
 
Earthing details of Electrical Substation
Earthing details of Electrical SubstationEarthing details of Electrical Substation
Earthing details of Electrical Substationstephanwindworld
 
Python Programming for basic beginners.pptx
Python Programming for basic beginners.pptxPython Programming for basic beginners.pptx
Python Programming for basic beginners.pptxmohitesoham12
 
Artificial Intelligence in Power System overview
Artificial Intelligence in Power System overviewArtificial Intelligence in Power System overview
Artificial Intelligence in Power System overviewsandhya757531
 
Risk Management in Engineering Construction Project
Risk Management in Engineering Construction ProjectRisk Management in Engineering Construction Project
Risk Management in Engineering Construction ProjectErbil Polytechnic University
 
DEVICE DRIVERS AND INTERRUPTS SERVICE MECHANISM.pdf
DEVICE DRIVERS AND INTERRUPTS  SERVICE MECHANISM.pdfDEVICE DRIVERS AND INTERRUPTS  SERVICE MECHANISM.pdf
DEVICE DRIVERS AND INTERRUPTS SERVICE MECHANISM.pdfAkritiPradhan2
 
Energy Awareness training ppt for manufacturing process.pptx
Energy Awareness training ppt for manufacturing process.pptxEnergy Awareness training ppt for manufacturing process.pptx
Energy Awareness training ppt for manufacturing process.pptxsiddharthjain2303
 

Recently uploaded (20)

Indian Dairy Industry Present Status and.ppt
Indian Dairy Industry Present Status and.pptIndian Dairy Industry Present Status and.ppt
Indian Dairy Industry Present Status and.ppt
 
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor CatchersTechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
 
Mine Environment II Lab_MI10448MI__________.pptx
Mine Environment II Lab_MI10448MI__________.pptxMine Environment II Lab_MI10448MI__________.pptx
Mine Environment II Lab_MI10448MI__________.pptx
 
Comprehensive energy systems.pdf Comprehensive energy systems.pdf
Comprehensive energy systems.pdf Comprehensive energy systems.pdfComprehensive energy systems.pdf Comprehensive energy systems.pdf
Comprehensive energy systems.pdf Comprehensive energy systems.pdf
 
Input Output Management in Operating System
Input Output Management in Operating SystemInput Output Management in Operating System
Input Output Management in Operating System
 
"Exploring the Essential Functions and Design Considerations of Spillways in ...
"Exploring the Essential Functions and Design Considerations of Spillways in ..."Exploring the Essential Functions and Design Considerations of Spillways in ...
"Exploring the Essential Functions and Design Considerations of Spillways in ...
 
Novel 3D-Printed Soft Linear and Bending Actuators
Novel 3D-Printed Soft Linear and Bending ActuatorsNovel 3D-Printed Soft Linear and Bending Actuators
Novel 3D-Printed Soft Linear and Bending Actuators
 
Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...
Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...
Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...
 
High Voltage Engineering- OVER VOLTAGES IN ELECTRICAL POWER SYSTEMS
High Voltage Engineering- OVER VOLTAGES IN ELECTRICAL POWER SYSTEMSHigh Voltage Engineering- OVER VOLTAGES IN ELECTRICAL POWER SYSTEMS
High Voltage Engineering- OVER VOLTAGES IN ELECTRICAL POWER SYSTEMS
 
Crushers to screens in aggregate production
Crushers to screens in aggregate productionCrushers to screens in aggregate production
Crushers to screens in aggregate production
 
Correctly Loading Incremental Data at Scale
Correctly Loading Incremental Data at ScaleCorrectly Loading Incremental Data at Scale
Correctly Loading Incremental Data at Scale
 
National Level Hackathon Participation Certificate.pdf
National Level Hackathon Participation Certificate.pdfNational Level Hackathon Participation Certificate.pdf
National Level Hackathon Participation Certificate.pdf
 
Engineering Drawing section of solid
Engineering Drawing     section of solidEngineering Drawing     section of solid
Engineering Drawing section of solid
 
Immutable Image-Based Operating Systems - EW2024.pdf
Immutable Image-Based Operating Systems - EW2024.pdfImmutable Image-Based Operating Systems - EW2024.pdf
Immutable Image-Based Operating Systems - EW2024.pdf
 
Earthing details of Electrical Substation
Earthing details of Electrical SubstationEarthing details of Electrical Substation
Earthing details of Electrical Substation
 
Python Programming for basic beginners.pptx
Python Programming for basic beginners.pptxPython Programming for basic beginners.pptx
Python Programming for basic beginners.pptx
 
Artificial Intelligence in Power System overview
Artificial Intelligence in Power System overviewArtificial Intelligence in Power System overview
Artificial Intelligence in Power System overview
 
Risk Management in Engineering Construction Project
Risk Management in Engineering Construction ProjectRisk Management in Engineering Construction Project
Risk Management in Engineering Construction Project
 
DEVICE DRIVERS AND INTERRUPTS SERVICE MECHANISM.pdf
DEVICE DRIVERS AND INTERRUPTS  SERVICE MECHANISM.pdfDEVICE DRIVERS AND INTERRUPTS  SERVICE MECHANISM.pdf
DEVICE DRIVERS AND INTERRUPTS SERVICE MECHANISM.pdf
 
Energy Awareness training ppt for manufacturing process.pptx
Energy Awareness training ppt for manufacturing process.pptxEnergy Awareness training ppt for manufacturing process.pptx
Energy Awareness training ppt for manufacturing process.pptx
 

Handling Incidents collaboratively is like solving a Rubik's cube.pptx

Editor's Notes

  1. Handling incidents collaboratively is like solving a rubix cube Understanding the business outcome and the overall functionality of a system consisting of distributed services and the infrastructure components to run them at scale is almost like solving a Rubix cube. Once an incident occurs, it is not enough to look at the single side of a rubix cube. In order to solve the puzzle, all sides of the cube need to be considered. Monitoring a distributed system should not be the single effort of a single engineering team. Observability should be a goal for all engineering teams. Nevertheless, it is often a mantra just for SRE teams. Coming from the perspective of an application engineer, I will outline how an application engineer benefits from understanding infrastructure and common incidents and how SRE teams can benefit from understanding common failures when talking about the application code. Let’s take a deeper look at what collaboration across different engineering teams means and how it supports the process of resolving the rubix cube together.
  2. The side of the application developers (backend and frontend) They live in their IDE
  3. The side of an SRE
  4. Sides where different developer groups meet
  5. Getting an order of things that have happen, different engineering team might look at different tools for that, that is fine but well when trouble is around the corner it is nice if everyone looks at the same picture
  6. Best practices and documentation can help! Reading it helps even more… Heavy text based? Not a good idea
  7. Solution: Setting retries: either in Service Mesh or in Backend code, both is possible but the importance is the retrz strategy needs to be communicated among different teams
  8. Best practices and documentation can help! Reading it helps even more… Heavy text based? Not a good idea
  9. Different shades of architecture and services