SlideShare a Scribd company logo
Komodor <> Epsagon | May 2021
Tracking changes
in a distributed system
The dark side of changes
Cloud native | March 2021
Komodor <> Epsagon | May 2021
● The CTO and co-founder of Komodor, a
startup building the first k8s-native
troubleshooting platform.
● A big believer in dev empowerment and
moving fast.
● Worked at eBay|Forter| Rookout
(first developer), A lot backend and
infra developer experience (“DevOps”)
● K8S fan 😃
Who am I?
Cloud native | March 2021
Komodor <> Epsagon | May 2021
Agenda
1. Why should you care what changed
2. What is a change
3. Why is it so hard to find what changed
4. The future of changes tracking
5. What can you do???
Komodor <> Epsagon | May 2021
Why should you care
what changed
● Issues happen on an hourly basis
● They derive from complete system downtime to a
small bug in staging
● 85% of incidents can be traced to system
changes!!!
● Most troubleshooting time is focused around
identifying the issue
Komodor <> Epsagon | May 2021
What is a change?
Any action that altered the system
state.
For example:
● Code deployment
● Infra changes (Cloud/on prem)
● Config change
● Feature flag
● Job’s changes
● DB migrations
● 3 party changes
● Customer usage or data*
Komodor <> Epsagon | May 2021
Why is it so hard
to find what changed?
Komodor <> Epsagon | May 2021
1. Heavily Rely on 3parties (cloud/ api’s etc’)
2. Includes dozens of microservices
3. Changes rapidly (the more the better)
4. Everyone can make a change (shift left)
TL;DR
Modern systems are basically a super
complex puzzle that changes rapidly.
Modern Haystack
Komodor <> Epsagon | May 2021
What makes it
extra hard?
1. Everything is connected - Ripple effect can cause
“unrelated change” to crash the system
2. Dark data - Unaudited changes are happening all
day long! (cloud changes/deploy to production/3
parties changes etc.)
3. Scattered data - Tracking changes efficnetly require
opening up different systems and query each
individually
Komodor <> Epsagon | May 2021
#alerts-
production
current
status
find
last job
what code
changed
“who changed
what”
How does it look like?
original
alert
Other
“unrelated”
service
change was
the root
cause
Komodor <> Epsagon | May 2021
All indicators of change
tracking & troubleshooting are
moving in the same direction
Velocity is ever growing More people can change System are becoming
more complex
Komodor <> Epsagon | May 2021
So, what
can you
do?
1. Admitting you have a problem
2. Automate change Notification to slack
(or monitoring tools)
3. Use IAC as much as possible
4. Create a changes process (even if just for reporting)
5. Improve cross team communication while troubleshooting
6. Eliminate unaudited change: use process or tool
7. Use distributed tracing to better understand system topology
8. Use tags/ annotation and metadata with relevant version
9. Gitops can eliminate some of the issues
10. Create playbooks with links to relevant tools changes
10 quick tips
Komodor <> Epsagon | May 2021
Troubleshooting
can be easy 😎
BTW, We are HIRING!

More Related Content

What's hot

2017 Microservices Practitioner Virtual Summit - Opening Keynote: Trends in M...
2017 Microservices Practitioner Virtual Summit - Opening Keynote: Trends in M...2017 Microservices Practitioner Virtual Summit - Opening Keynote: Trends in M...
2017 Microservices Practitioner Virtual Summit - Opening Keynote: Trends in M...
Ambassador Labs
 
Datadog- Monitoring In Motion
Datadog- Monitoring In Motion Datadog- Monitoring In Motion
Datadog- Monitoring In Motion
Cloud Native Apps SF
 
Biologically Inspired Internet of Things
Biologically Inspired Internet of ThingsBiologically Inspired Internet of Things
Biologically Inspired Internet of Things
kellogh
 
Controlled Evolution with Puppet and AWS
Controlled Evolution with Puppet and AWSControlled Evolution with Puppet and AWS
Controlled Evolution with Puppet and AWS
Puppet
 
2017 Microservices Practitioner Virtual Summit: Move Fast, Make Things: how d...
2017 Microservices Practitioner Virtual Summit: Move Fast, Make Things: how d...2017 Microservices Practitioner Virtual Summit: Move Fast, Make Things: how d...
2017 Microservices Practitioner Virtual Summit: Move Fast, Make Things: how d...
Ambassador Labs
 
Why Visibility into Your Stack Matters
Why Visibility into Your Stack MattersWhy Visibility into Your Stack Matters
Why Visibility into Your Stack Matters
Amazon Web Services
 
Do DevOps Right with New Relic
Do DevOps Right with New RelicDo DevOps Right with New Relic
Do DevOps Right with New Relic
New Relic
 
The future of Data on Kubernetes
The future of Data on KubernetesThe future of Data on Kubernetes
The future of Data on Kubernetes
DoKC
 
Code-to-Cloud Visibility: An Essential Framework for DevOps Success
Code-to-Cloud Visibility: An Essential Framework for DevOps SuccessCode-to-Cloud Visibility: An Essential Framework for DevOps Success
Code-to-Cloud Visibility: An Essential Framework for DevOps Success
JadeCampbell13
 
PuppetConf 2016: Case Study: Puppets in the Government – Kathy Lee (co-author...
PuppetConf 2016: Case Study: Puppets in the Government – Kathy Lee (co-author...PuppetConf 2016: Case Study: Puppets in the Government – Kathy Lee (co-author...
PuppetConf 2016: Case Study: Puppets in the Government – Kathy Lee (co-author...
Puppet
 
2017 Microservices Practitioner Virtual Summit: Ancestry's Journey towards Mi...
2017 Microservices Practitioner Virtual Summit: Ancestry's Journey towards Mi...2017 Microservices Practitioner Virtual Summit: Ancestry's Journey towards Mi...
2017 Microservices Practitioner Virtual Summit: Ancestry's Journey towards Mi...
Ambassador Labs
 
Monitoring with Artificial Intelligence [Webinar]
Monitoring with Artificial Intelligence [Webinar]Monitoring with Artificial Intelligence [Webinar]
Monitoring with Artificial Intelligence [Webinar]
Dynatrace
 
Why HTTP Won't Work For The Internet of Things
Why HTTP Won't Work For The Internet of ThingsWhy HTTP Won't Work For The Internet of Things
Why HTTP Won't Work For The Internet of Things
kellogh
 
Intro to Puppet Enterprise for a Windows Environment - 08.23
Intro to Puppet Enterprise for a Windows Environment - 08.23Intro to Puppet Enterprise for a Windows Environment - 08.23
Intro to Puppet Enterprise for a Windows Environment - 08.23
Puppet
 
Towards Continuous Consistency Checking of DevOps Artefacts
Towards Continuous Consistency Checking of DevOps ArtefactsTowards Continuous Consistency Checking of DevOps Artefacts
Towards Continuous Consistency Checking of DevOps Artefacts
IncQuery Labs
 
[Confoo Montreal 2020] Build Your Own Serverless with Knative - Alex Gervais
[Confoo Montreal 2020] Build Your Own Serverless with Knative - Alex Gervais[Confoo Montreal 2020] Build Your Own Serverless with Knative - Alex Gervais
[Confoo Montreal 2020] Build Your Own Serverless with Knative - Alex Gervais
Ambassador Labs
 
Cloud Native DevOps
Cloud Native DevOpsCloud Native DevOps
Cloud Native DevOps
Jim Bugwadia
 
OSMC 2017 | How is Zabbix doing – an outside look by Rihards Olups
OSMC 2017 | How is Zabbix doing – an outside look by Rihards OlupsOSMC 2017 | How is Zabbix doing – an outside look by Rihards Olups
OSMC 2017 | How is Zabbix doing – an outside look by Rihards Olups
NETWAYS
 
Monitoring via Datadog
Monitoring via DatadogMonitoring via Datadog
Monitoring via Datadog
Knoldus Inc.
 
Mc git ops_incorpbackups_kanister
Mc git ops_incorpbackups_kanisterMc git ops_incorpbackups_kanister
Mc git ops_incorpbackups_kanister
LibbySchulze
 

What's hot (20)

2017 Microservices Practitioner Virtual Summit - Opening Keynote: Trends in M...
2017 Microservices Practitioner Virtual Summit - Opening Keynote: Trends in M...2017 Microservices Practitioner Virtual Summit - Opening Keynote: Trends in M...
2017 Microservices Practitioner Virtual Summit - Opening Keynote: Trends in M...
 
Datadog- Monitoring In Motion
Datadog- Monitoring In Motion Datadog- Monitoring In Motion
Datadog- Monitoring In Motion
 
Biologically Inspired Internet of Things
Biologically Inspired Internet of ThingsBiologically Inspired Internet of Things
Biologically Inspired Internet of Things
 
Controlled Evolution with Puppet and AWS
Controlled Evolution with Puppet and AWSControlled Evolution with Puppet and AWS
Controlled Evolution with Puppet and AWS
 
2017 Microservices Practitioner Virtual Summit: Move Fast, Make Things: how d...
2017 Microservices Practitioner Virtual Summit: Move Fast, Make Things: how d...2017 Microservices Practitioner Virtual Summit: Move Fast, Make Things: how d...
2017 Microservices Practitioner Virtual Summit: Move Fast, Make Things: how d...
 
Why Visibility into Your Stack Matters
Why Visibility into Your Stack MattersWhy Visibility into Your Stack Matters
Why Visibility into Your Stack Matters
 
Do DevOps Right with New Relic
Do DevOps Right with New RelicDo DevOps Right with New Relic
Do DevOps Right with New Relic
 
The future of Data on Kubernetes
The future of Data on KubernetesThe future of Data on Kubernetes
The future of Data on Kubernetes
 
Code-to-Cloud Visibility: An Essential Framework for DevOps Success
Code-to-Cloud Visibility: An Essential Framework for DevOps SuccessCode-to-Cloud Visibility: An Essential Framework for DevOps Success
Code-to-Cloud Visibility: An Essential Framework for DevOps Success
 
PuppetConf 2016: Case Study: Puppets in the Government – Kathy Lee (co-author...
PuppetConf 2016: Case Study: Puppets in the Government – Kathy Lee (co-author...PuppetConf 2016: Case Study: Puppets in the Government – Kathy Lee (co-author...
PuppetConf 2016: Case Study: Puppets in the Government – Kathy Lee (co-author...
 
2017 Microservices Practitioner Virtual Summit: Ancestry's Journey towards Mi...
2017 Microservices Practitioner Virtual Summit: Ancestry's Journey towards Mi...2017 Microservices Practitioner Virtual Summit: Ancestry's Journey towards Mi...
2017 Microservices Practitioner Virtual Summit: Ancestry's Journey towards Mi...
 
Monitoring with Artificial Intelligence [Webinar]
Monitoring with Artificial Intelligence [Webinar]Monitoring with Artificial Intelligence [Webinar]
Monitoring with Artificial Intelligence [Webinar]
 
Why HTTP Won't Work For The Internet of Things
Why HTTP Won't Work For The Internet of ThingsWhy HTTP Won't Work For The Internet of Things
Why HTTP Won't Work For The Internet of Things
 
Intro to Puppet Enterprise for a Windows Environment - 08.23
Intro to Puppet Enterprise for a Windows Environment - 08.23Intro to Puppet Enterprise for a Windows Environment - 08.23
Intro to Puppet Enterprise for a Windows Environment - 08.23
 
Towards Continuous Consistency Checking of DevOps Artefacts
Towards Continuous Consistency Checking of DevOps ArtefactsTowards Continuous Consistency Checking of DevOps Artefacts
Towards Continuous Consistency Checking of DevOps Artefacts
 
[Confoo Montreal 2020] Build Your Own Serverless with Knative - Alex Gervais
[Confoo Montreal 2020] Build Your Own Serverless with Knative - Alex Gervais[Confoo Montreal 2020] Build Your Own Serverless with Knative - Alex Gervais
[Confoo Montreal 2020] Build Your Own Serverless with Knative - Alex Gervais
 
Cloud Native DevOps
Cloud Native DevOpsCloud Native DevOps
Cloud Native DevOps
 
OSMC 2017 | How is Zabbix doing – an outside look by Rihards Olups
OSMC 2017 | How is Zabbix doing – an outside look by Rihards OlupsOSMC 2017 | How is Zabbix doing – an outside look by Rihards Olups
OSMC 2017 | How is Zabbix doing – an outside look by Rihards Olups
 
Monitoring via Datadog
Monitoring via DatadogMonitoring via Datadog
Monitoring via Datadog
 
Mc git ops_incorpbackups_kanister
Mc git ops_incorpbackups_kanisterMc git ops_incorpbackups_kanister
Mc git ops_incorpbackups_kanister
 

Similar to Troubleshooting in a distributed systems

Never Lose Data Again: Robust Integrations With MuleSoft
Never Lose Data Again: Robust Integrations With MuleSoftNever Lose Data Again: Robust Integrations With MuleSoft
Never Lose Data Again: Robust Integrations With MuleSoft
AaronLieberman5
 
Action Plan 2021 (Updated).docx
Action Plan 2021 (Updated).docxAction Plan 2021 (Updated).docx
Action Plan 2021 (Updated).docx
ChristleSantuyo
 
Removing CI/CD Blockers: Navigating K8s with Codefresh & Komodor
Removing CI/CD Blockers: Navigating K8s with Codefresh & KomodorRemoving CI/CD Blockers: Navigating K8s with Codefresh & Komodor
Removing CI/CD Blockers: Navigating K8s with Codefresh & Komodor
Komodor
 
Transaction handling in com, ejb and .net
Transaction handling in com, ejb and .netTransaction handling in com, ejb and .net
Transaction handling in com, ejb and .net
ijseajournal
 
Microservices and Kubernetes for your Full Data Lifecycle
Microservices and Kubernetes for your Full Data LifecycleMicroservices and Kubernetes for your Full Data Lifecycle
Microservices and Kubernetes for your Full Data Lifecycle
DoKC
 
Hari proposal
Hari proposalHari proposal
Hari proposal
Rey Jefferson
 
Skynet project: Monitor, analyze, scale, and maintain a system in the Cloud
Skynet project: Monitor, analyze, scale, and maintain a system in the CloudSkynet project: Monitor, analyze, scale, and maintain a system in the Cloud
Skynet project: Monitor, analyze, scale, and maintain a system in the Cloud
Sylvain Kalache
 
Evolution of Monitoring and Prometheus (Dublin 2018)
Evolution of Monitoring and Prometheus (Dublin 2018)Evolution of Monitoring and Prometheus (Dublin 2018)
Evolution of Monitoring and Prometheus (Dublin 2018)
Brian Brazil
 
From hello world to goodbye code
From hello world to goodbye codeFrom hello world to goodbye code
From hello world to goodbye code
Kim Moir
 
From Duke of DevOps to Queen of Chaos - Api days 2018
From Duke of DevOps to Queen of Chaos - Api days 2018From Duke of DevOps to Queen of Chaos - Api days 2018
From Duke of DevOps to Queen of Chaos - Api days 2018
Christophe Rochefolle
 
onTune the differences
onTune the differencesonTune the differences
onTune the differences
TeemStone Pty Ltd
 
Automatic Assessment of Failure Recovery in Erlang Applications
Automatic Assessment of Failure Recovery in Erlang ApplicationsAutomatic Assessment of Failure Recovery in Erlang Applications
Automatic Assessment of Failure Recovery in Erlang Applications
Jan Henry Nystrom
 
Argonne Win7 Closeout
Argonne Win7 CloseoutArgonne Win7 Closeout
Argonne Win7 Closeout
Chad Karkos
 
Speeding up enterprises, one deploy at a time - Devopsdays Toronto 2014
Speeding up enterprises, one deploy at a time - Devopsdays Toronto 2014Speeding up enterprises, one deploy at a time - Devopsdays Toronto 2014
Speeding up enterprises, one deploy at a time - Devopsdays Toronto 2014
Stuart Charlton
 
Kernel Recipes 2016 - The kernel report
Kernel Recipes 2016 - The kernel reportKernel Recipes 2016 - The kernel report
Kernel Recipes 2016 - The kernel report
Anne Nicolas
 
What is IoT System Architecture.pdf
What is IoT System Architecture.pdfWhat is IoT System Architecture.pdf
What is IoT System Architecture.pdf
Antenna Manufacturer Coco
 
Doug Sillars on App Optimization
Doug Sillars on App OptimizationDoug Sillars on App Optimization
Doug Sillars on App Optimization
wipjam
 
A practical look at how to build & run IoT business logic
A practical look at how to build & run IoT business logicA practical look at how to build & run IoT business logic
A practical look at how to build & run IoT business logic
Veselin Pizurica
 
Container-based Microservices DevOps in AWS
Container-based Microservices DevOps in AWSContainer-based Microservices DevOps in AWS
Container-based Microservices DevOps in AWS
Moshe Ben Shoham
 
Jaeger Integration with Spring Cloud
Jaeger Integration with Spring CloudJaeger Integration with Spring Cloud
Jaeger Integration with Spring Cloud
Inexture Solutions
 

Similar to Troubleshooting in a distributed systems (20)

Never Lose Data Again: Robust Integrations With MuleSoft
Never Lose Data Again: Robust Integrations With MuleSoftNever Lose Data Again: Robust Integrations With MuleSoft
Never Lose Data Again: Robust Integrations With MuleSoft
 
Action Plan 2021 (Updated).docx
Action Plan 2021 (Updated).docxAction Plan 2021 (Updated).docx
Action Plan 2021 (Updated).docx
 
Removing CI/CD Blockers: Navigating K8s with Codefresh & Komodor
Removing CI/CD Blockers: Navigating K8s with Codefresh & KomodorRemoving CI/CD Blockers: Navigating K8s with Codefresh & Komodor
Removing CI/CD Blockers: Navigating K8s with Codefresh & Komodor
 
Transaction handling in com, ejb and .net
Transaction handling in com, ejb and .netTransaction handling in com, ejb and .net
Transaction handling in com, ejb and .net
 
Microservices and Kubernetes for your Full Data Lifecycle
Microservices and Kubernetes for your Full Data LifecycleMicroservices and Kubernetes for your Full Data Lifecycle
Microservices and Kubernetes for your Full Data Lifecycle
 
Hari proposal
Hari proposalHari proposal
Hari proposal
 
Skynet project: Monitor, analyze, scale, and maintain a system in the Cloud
Skynet project: Monitor, analyze, scale, and maintain a system in the CloudSkynet project: Monitor, analyze, scale, and maintain a system in the Cloud
Skynet project: Monitor, analyze, scale, and maintain a system in the Cloud
 
Evolution of Monitoring and Prometheus (Dublin 2018)
Evolution of Monitoring and Prometheus (Dublin 2018)Evolution of Monitoring and Prometheus (Dublin 2018)
Evolution of Monitoring and Prometheus (Dublin 2018)
 
From hello world to goodbye code
From hello world to goodbye codeFrom hello world to goodbye code
From hello world to goodbye code
 
From Duke of DevOps to Queen of Chaos - Api days 2018
From Duke of DevOps to Queen of Chaos - Api days 2018From Duke of DevOps to Queen of Chaos - Api days 2018
From Duke of DevOps to Queen of Chaos - Api days 2018
 
onTune the differences
onTune the differencesonTune the differences
onTune the differences
 
Automatic Assessment of Failure Recovery in Erlang Applications
Automatic Assessment of Failure Recovery in Erlang ApplicationsAutomatic Assessment of Failure Recovery in Erlang Applications
Automatic Assessment of Failure Recovery in Erlang Applications
 
Argonne Win7 Closeout
Argonne Win7 CloseoutArgonne Win7 Closeout
Argonne Win7 Closeout
 
Speeding up enterprises, one deploy at a time - Devopsdays Toronto 2014
Speeding up enterprises, one deploy at a time - Devopsdays Toronto 2014Speeding up enterprises, one deploy at a time - Devopsdays Toronto 2014
Speeding up enterprises, one deploy at a time - Devopsdays Toronto 2014
 
Kernel Recipes 2016 - The kernel report
Kernel Recipes 2016 - The kernel reportKernel Recipes 2016 - The kernel report
Kernel Recipes 2016 - The kernel report
 
What is IoT System Architecture.pdf
What is IoT System Architecture.pdfWhat is IoT System Architecture.pdf
What is IoT System Architecture.pdf
 
Doug Sillars on App Optimization
Doug Sillars on App OptimizationDoug Sillars on App Optimization
Doug Sillars on App Optimization
 
A practical look at how to build & run IoT business logic
A practical look at how to build & run IoT business logicA practical look at how to build & run IoT business logic
A practical look at how to build & run IoT business logic
 
Container-based Microservices DevOps in AWS
Container-based Microservices DevOps in AWSContainer-based Microservices DevOps in AWS
Container-based Microservices DevOps in AWS
 
Jaeger Integration with Spring Cloud
Jaeger Integration with Spring CloudJaeger Integration with Spring Cloud
Jaeger Integration with Spring Cloud
 

Recently uploaded

openEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain SecurityopenEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain Security
Shane Coughlan
 
socradar-q1-2024-aviation-industry-report.pdf
socradar-q1-2024-aviation-industry-report.pdfsocradar-q1-2024-aviation-industry-report.pdf
socradar-q1-2024-aviation-industry-report.pdf
SOCRadar
 
Empowering Growth with Best Software Development Company in Noida - Deuglo
Empowering Growth with Best Software  Development Company in Noida - DeugloEmpowering Growth with Best Software  Development Company in Noida - Deuglo
Empowering Growth with Best Software Development Company in Noida - Deuglo
Deuglo Infosystem Pvt Ltd
 
Why Choose Odoo 17 Community & How it differs from Odoo 17 Enterprise Edition
Why Choose Odoo 17 Community & How it differs from Odoo 17 Enterprise EditionWhy Choose Odoo 17 Community & How it differs from Odoo 17 Enterprise Edition
Why Choose Odoo 17 Community & How it differs from Odoo 17 Enterprise Edition
Envertis Software Solutions
 
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
Łukasz Chruściel
 
GraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph TechnologyGraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph Technology
Neo4j
 
LORRAINE ANDREI_LEQUIGAN_HOW TO USE WHATSAPP.pptx
LORRAINE ANDREI_LEQUIGAN_HOW TO USE WHATSAPP.pptxLORRAINE ANDREI_LEQUIGAN_HOW TO USE WHATSAPP.pptx
LORRAINE ANDREI_LEQUIGAN_HOW TO USE WHATSAPP.pptx
lorraineandreiamcidl
 
How to write a program in any programming language
How to write a program in any programming languageHow to write a program in any programming language
How to write a program in any programming language
Rakesh Kumar R
 
SWEBOK and Education at FUSE Okinawa 2024
SWEBOK and Education at FUSE Okinawa 2024SWEBOK and Education at FUSE Okinawa 2024
SWEBOK and Education at FUSE Okinawa 2024
Hironori Washizaki
 
Fundamentals of Programming and Language Processors
Fundamentals of Programming and Language ProcessorsFundamentals of Programming and Language Processors
Fundamentals of Programming and Language Processors
Rakesh Kumar R
 
Revolutionizing Visual Effects Mastering AI Face Swaps.pdf
Revolutionizing Visual Effects Mastering AI Face Swaps.pdfRevolutionizing Visual Effects Mastering AI Face Swaps.pdf
Revolutionizing Visual Effects Mastering AI Face Swaps.pdf
Undress Baby
 
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Łukasz Chruściel
 
Vitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdfVitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke
 
Oracle 23c New Features For DBAs and Developers.pptx
Oracle 23c New Features For DBAs and Developers.pptxOracle 23c New Features For DBAs and Developers.pptx
Oracle 23c New Features For DBAs and Developers.pptx
Remote DBA Services
 
E-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian Companies
E-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian CompaniesE-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian Companies
E-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian Companies
Quickdice ERP
 
E-commerce Development Services- Hornet Dynamics
E-commerce Development Services- Hornet DynamicsE-commerce Development Services- Hornet Dynamics
E-commerce Development Services- Hornet Dynamics
Hornet Dynamics
 
SMS API Integration in Saudi Arabia| Best SMS API Service
SMS API Integration in Saudi Arabia| Best SMS API ServiceSMS API Integration in Saudi Arabia| Best SMS API Service
SMS API Integration in Saudi Arabia| Best SMS API Service
Yara Milbes
 
Hand Rolled Applicative User Validation Code Kata
Hand Rolled Applicative User ValidationCode KataHand Rolled Applicative User ValidationCode Kata
Hand Rolled Applicative User Validation Code Kata
Philip Schwarz
 
What is Augmented Reality Image Tracking
What is Augmented Reality Image TrackingWhat is Augmented Reality Image Tracking
What is Augmented Reality Image Tracking
pavan998932
 
UI5con 2024 - Keynote: Latest News about UI5 and it’s Ecosystem
UI5con 2024 - Keynote: Latest News about UI5 and it’s EcosystemUI5con 2024 - Keynote: Latest News about UI5 and it’s Ecosystem
UI5con 2024 - Keynote: Latest News about UI5 and it’s Ecosystem
Peter Muessig
 

Recently uploaded (20)

openEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain SecurityopenEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain Security
 
socradar-q1-2024-aviation-industry-report.pdf
socradar-q1-2024-aviation-industry-report.pdfsocradar-q1-2024-aviation-industry-report.pdf
socradar-q1-2024-aviation-industry-report.pdf
 
Empowering Growth with Best Software Development Company in Noida - Deuglo
Empowering Growth with Best Software  Development Company in Noida - DeugloEmpowering Growth with Best Software  Development Company in Noida - Deuglo
Empowering Growth with Best Software Development Company in Noida - Deuglo
 
Why Choose Odoo 17 Community & How it differs from Odoo 17 Enterprise Edition
Why Choose Odoo 17 Community & How it differs from Odoo 17 Enterprise EditionWhy Choose Odoo 17 Community & How it differs from Odoo 17 Enterprise Edition
Why Choose Odoo 17 Community & How it differs from Odoo 17 Enterprise Edition
 
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
 
GraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph TechnologyGraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph Technology
 
LORRAINE ANDREI_LEQUIGAN_HOW TO USE WHATSAPP.pptx
LORRAINE ANDREI_LEQUIGAN_HOW TO USE WHATSAPP.pptxLORRAINE ANDREI_LEQUIGAN_HOW TO USE WHATSAPP.pptx
LORRAINE ANDREI_LEQUIGAN_HOW TO USE WHATSAPP.pptx
 
How to write a program in any programming language
How to write a program in any programming languageHow to write a program in any programming language
How to write a program in any programming language
 
SWEBOK and Education at FUSE Okinawa 2024
SWEBOK and Education at FUSE Okinawa 2024SWEBOK and Education at FUSE Okinawa 2024
SWEBOK and Education at FUSE Okinawa 2024
 
Fundamentals of Programming and Language Processors
Fundamentals of Programming and Language ProcessorsFundamentals of Programming and Language Processors
Fundamentals of Programming and Language Processors
 
Revolutionizing Visual Effects Mastering AI Face Swaps.pdf
Revolutionizing Visual Effects Mastering AI Face Swaps.pdfRevolutionizing Visual Effects Mastering AI Face Swaps.pdf
Revolutionizing Visual Effects Mastering AI Face Swaps.pdf
 
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
 
Vitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdfVitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdf
 
Oracle 23c New Features For DBAs and Developers.pptx
Oracle 23c New Features For DBAs and Developers.pptxOracle 23c New Features For DBAs and Developers.pptx
Oracle 23c New Features For DBAs and Developers.pptx
 
E-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian Companies
E-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian CompaniesE-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian Companies
E-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian Companies
 
E-commerce Development Services- Hornet Dynamics
E-commerce Development Services- Hornet DynamicsE-commerce Development Services- Hornet Dynamics
E-commerce Development Services- Hornet Dynamics
 
SMS API Integration in Saudi Arabia| Best SMS API Service
SMS API Integration in Saudi Arabia| Best SMS API ServiceSMS API Integration in Saudi Arabia| Best SMS API Service
SMS API Integration in Saudi Arabia| Best SMS API Service
 
Hand Rolled Applicative User Validation Code Kata
Hand Rolled Applicative User ValidationCode KataHand Rolled Applicative User ValidationCode Kata
Hand Rolled Applicative User Validation Code Kata
 
What is Augmented Reality Image Tracking
What is Augmented Reality Image TrackingWhat is Augmented Reality Image Tracking
What is Augmented Reality Image Tracking
 
UI5con 2024 - Keynote: Latest News about UI5 and it’s Ecosystem
UI5con 2024 - Keynote: Latest News about UI5 and it’s EcosystemUI5con 2024 - Keynote: Latest News about UI5 and it’s Ecosystem
UI5con 2024 - Keynote: Latest News about UI5 and it’s Ecosystem
 

Troubleshooting in a distributed systems

  • 1. Komodor <> Epsagon | May 2021 Tracking changes in a distributed system The dark side of changes
  • 2. Cloud native | March 2021 Komodor <> Epsagon | May 2021 ● The CTO and co-founder of Komodor, a startup building the first k8s-native troubleshooting platform. ● A big believer in dev empowerment and moving fast. ● Worked at eBay|Forter| Rookout (first developer), A lot backend and infra developer experience (“DevOps”) ● K8S fan 😃 Who am I?
  • 3. Cloud native | March 2021 Komodor <> Epsagon | May 2021 Agenda 1. Why should you care what changed 2. What is a change 3. Why is it so hard to find what changed 4. The future of changes tracking 5. What can you do???
  • 4. Komodor <> Epsagon | May 2021 Why should you care what changed ● Issues happen on an hourly basis ● They derive from complete system downtime to a small bug in staging ● 85% of incidents can be traced to system changes!!! ● Most troubleshooting time is focused around identifying the issue
  • 5. Komodor <> Epsagon | May 2021 What is a change? Any action that altered the system state. For example: ● Code deployment ● Infra changes (Cloud/on prem) ● Config change ● Feature flag ● Job’s changes ● DB migrations ● 3 party changes ● Customer usage or data*
  • 6. Komodor <> Epsagon | May 2021 Why is it so hard to find what changed?
  • 7. Komodor <> Epsagon | May 2021 1. Heavily Rely on 3parties (cloud/ api’s etc’) 2. Includes dozens of microservices 3. Changes rapidly (the more the better) 4. Everyone can make a change (shift left) TL;DR Modern systems are basically a super complex puzzle that changes rapidly. Modern Haystack
  • 8. Komodor <> Epsagon | May 2021 What makes it extra hard? 1. Everything is connected - Ripple effect can cause “unrelated change” to crash the system 2. Dark data - Unaudited changes are happening all day long! (cloud changes/deploy to production/3 parties changes etc.) 3. Scattered data - Tracking changes efficnetly require opening up different systems and query each individually
  • 9. Komodor <> Epsagon | May 2021 #alerts- production current status find last job what code changed “who changed what” How does it look like? original alert Other “unrelated” service change was the root cause
  • 10. Komodor <> Epsagon | May 2021 All indicators of change tracking & troubleshooting are moving in the same direction Velocity is ever growing More people can change System are becoming more complex
  • 11. Komodor <> Epsagon | May 2021 So, what can you do? 1. Admitting you have a problem 2. Automate change Notification to slack (or monitoring tools) 3. Use IAC as much as possible 4. Create a changes process (even if just for reporting) 5. Improve cross team communication while troubleshooting 6. Eliminate unaudited change: use process or tool 7. Use distributed tracing to better understand system topology 8. Use tags/ annotation and metadata with relevant version 9. Gitops can eliminate some of the issues 10. Create playbooks with links to relevant tools changes 10 quick tips
  • 12. Komodor <> Epsagon | May 2021 Troubleshooting can be easy 😎 BTW, We are HIRING!