SlideShare a Scribd company logo
1 of 33
Download to read offline
bertjan@openvalue.eu
Debugging distributed systems
Bert Jan Schrijver
@bjschrijver
Debugging distributed systems: the good parts
bertjan@openvalue.eu
Bert Jan Schrijver
@bjschrijver
Networking 101
How the internet works
Why?
Bert Jan Schrijver
L e t ’ s m e e t
@bjschrijver
Why are distributed
systems difficult?
Networking 101
What?
Why?
✅
Demo
War stories
Conclusion
W h a t ‘ s n e x t ?
Outline
A structured approach
@bjschrijver
What is a distributed system?
A distributed system is a system whose
components are located on different
networked computers, which
communicate and coordinate their actions
by passing messages to one another.
• Concurrency of components


• Lack of a global clock


• Independent failure of components


• Distributed systems are harder to reason
about
Characteristics of distributed systems
Source: http://www.nasa.gov/images/content/218652main_STOCC_FS_img_lg.jpg
Working with distributed systems is
fundamentally different from writing
software on a single computer - and the
main difference is that there are lots of new
and exciting ways for things to go wrong.


- Martin Kleppmann
“
”
Photo: Dave Lehl
What could possibly go wrong?
“ ”
Photo: Dave Lehl
OSI & TCP/IP
Source: https://www.guru99.com/difference-tcp-ip-vs-osi-model.html
.. in your browser’s address bar and press Enter
What happens when you type google.com…
Source: https://github.com/alex/what-happens-when
13
Source: https://7216-presscdn-0-76-pagely.netdna-ssl.com/wp-content/uploads/2011/12/confused-man-single-good-men.jpg
Where do I start?
Step 1: Observe & document
• What do you know about the problem?


• Inspect logging, errors, metrics, tracing


• Draw the path from source to target - what’s
in between? Focus on details!


• Document what you know


• Can we reproduce in a test?


• By injecting errors, for example
Tools


Whiteboard,
documentation, logging,
metrics, tracing
(opentracing.io), tests,
Jepsen
Step 1: Observe & document
Step 2: Create minimal reproducer
• Goal: maximise the amount of debugging
cycles


• Focus on short development iterations /
feedback loops


• Get close to the action!
Tools


IDE, Shell scripts,


SSH tunnels, Curl
Step 3: Debug client side
• Focus on eliminating anything that could be
wrong on the client side


• Are we connecting to the right host?


• Do we send the right message?


• Do we receive a response?


• Not much different from local


debugging
Tools


IDE, debugger,
logging
Step 4: Check DNS & routing
• DNS:


• Make sure you know what IP address the
hostname should resolve to


• Verify that this actually happens


at the client


• Routing:


• Verify you can reach the


target machine
Tools


host, nslookup,
dig, whois, ping,
traceroute,
nslookup.io,
dnschecker.org
Step 5: Check connection
• Can we connect to the port?


• If not, do we get a REJECT or a DROP?


• Does the connection open and stay open?


• Are we talking TLS?


• What is the connection speed


between us?
Tools


telnet, nc, curl,
iperf
Step 6: Inspect traffic / messages
• Do we send the right request?


• Do we receive the right response?


• How do we know?


• How do we handle TLS?


• Are there any load balancers


or proxies in between?
Tools


curl, wireshark,
tcpdump, network
tab in browser,
mitm/tls proxy
Step 7: Debug server side
• Inspect the remote host


• Can we attach a remote debugger?


• See https://youtube.com/OpenValue


• Profiling


• Strace


• Java Flight Recorder (JFR)


Tools


SSH tunnels, remote
debugger, profiler,
strace,


Java Flight Recorder
Step 8: Wrap up & post mortem
• Document the issue:


• Timeline


• What did we see?


• Why did it happen?


• What was the impact?


• How did we find out?


• What did we do to mitigate and fix?


• What should we do to prevent


repetition?
Tools


Whiteboard,
documentation
If you really want a reliable system, you
have to understand what its failure modes
are. You have to actually have witnessed
it misbehaving.


- Jason Cahoon
“
”
Distributed systems war stories
The time where it worked half of the time…
The one at a school…
The one with breaking news…
Summary: a structured approach
to debugging distributed systems
@bjschrijver
Check DNS & routing
Check connection
Debug client side
Create minimal reproducer
Debug server side
Observe & document
Wrap up & post mortem
Inspect traffic / messages
Source: https://cdn2.vox-cdn.com/thumbor/J9OqPYS7FgI9fjGhnF7AFh8foVY=/148x0:1768x1080/1280x854/cdn0.vox-cdn.com/uploads/chorus_image/image/46147742/cute-success-kid-1920x1080.0.0.jpg
THAT’S IT.


NOW GO KICK SOME ASS!
Questions?


@bjschrijver
Thanks for your time.
Got feedback? Tweet it!
All pictures belong


to their respective


authors
@bjschrijver

More Related Content

What's hot

Capital One DevOps Case Study: A Bank with the Heart of Tech Company
Capital One DevOps Case Study: A Bank with the Heart of Tech CompanyCapital One DevOps Case Study: A Bank with the Heart of Tech Company
Capital One DevOps Case Study: A Bank with the Heart of Tech CompanySimform
 
DevOps!! 도데체 왜, 어떻게 할까??
DevOps!! 도데체 왜, 어떻게 할까??DevOps!! 도데체 왜, 어떻게 할까??
DevOps!! 도데체 왜, 어떻게 할까??Joseph Kim
 
Git e GitHub: Versionamento de Código Fácil
Git e GitHub: Versionamento de Código FácilGit e GitHub: Versionamento de Código Fácil
Git e GitHub: Versionamento de Código FácilTiago Antônio da Silva
 
Funny stories and anti-patterns from DevOps landscape
Funny stories and anti-patterns from DevOps landscapeFunny stories and anti-patterns from DevOps landscape
Funny stories and anti-patterns from DevOps landscapeMikalai Alimenkou
 
Improve your Product Backlog Refinement (PBR) Process
Improve your Product Backlog Refinement (PBR) ProcessImprove your Product Backlog Refinement (PBR) Process
Improve your Product Backlog Refinement (PBR) ProcessAlexey Krivitsky
 
Understanding GIT and Version Control
Understanding GIT and Version ControlUnderstanding GIT and Version Control
Understanding GIT and Version ControlSourabh Sahu
 
Deloitte lean agile state of the nation
Deloitte lean   agile state of the nationDeloitte lean   agile state of the nation
Deloitte lean agile state of the nationAlexis Hui
 
DevSecOps Fundamentals and the Scars to Prove it.
DevSecOps Fundamentals and the Scars to Prove it.DevSecOps Fundamentals and the Scars to Prove it.
DevSecOps Fundamentals and the Scars to Prove it.Matt Tesauro
 
CONTINUOUS INTEGRATION WITH JENKINS AND GIT
CONTINUOUS INTEGRATION WITH JENKINS AND GITCONTINUOUS INTEGRATION WITH JENKINS AND GIT
CONTINUOUS INTEGRATION WITH JENKINS AND GITBenjamin Lutaaya
 
Five Key Numbers to Gauge your Agile Engineering Efforts
Five Key Numbers to Gauge your Agile Engineering EffortsFive Key Numbers to Gauge your Agile Engineering Efforts
Five Key Numbers to Gauge your Agile Engineering EffortsJeff Nielsen
 
Team Topologies in action - early results from industry - DOES London Virtual...
Team Topologies in action - early results from industry - DOES London Virtual...Team Topologies in action - early results from industry - DOES London Virtual...
Team Topologies in action - early results from industry - DOES London Virtual...Matthew Skelton
 
What Is DevOps? | Introduction To DevOps | DevOps Tools | DevOps Tutorial | D...
What Is DevOps? | Introduction To DevOps | DevOps Tools | DevOps Tutorial | D...What Is DevOps? | Introduction To DevOps | DevOps Tools | DevOps Tutorial | D...
What Is DevOps? | Introduction To DevOps | DevOps Tools | DevOps Tutorial | D...Edureka!
 
Getting Started - Introduction to Backlog Grooming
Getting Started - Introduction to Backlog GroomingGetting Started - Introduction to Backlog Grooming
Getting Started - Introduction to Backlog GroomingEasy Agile
 

What's hot (20)

Capital One DevOps Case Study: A Bank with the Heart of Tech Company
Capital One DevOps Case Study: A Bank with the Heart of Tech CompanyCapital One DevOps Case Study: A Bank with the Heart of Tech Company
Capital One DevOps Case Study: A Bank with the Heart of Tech Company
 
Scrum team and efficiency
Scrum team and efficiencyScrum team and efficiency
Scrum team and efficiency
 
Modelo incremental
Modelo incrementalModelo incremental
Modelo incremental
 
DevOps!! 도데체 왜, 어떻게 할까??
DevOps!! 도데체 왜, 어떻게 할까??DevOps!! 도데체 왜, 어떻게 할까??
DevOps!! 도데체 왜, 어떻게 할까??
 
Git e GitHub: Versionamento de Código Fácil
Git e GitHub: Versionamento de Código FácilGit e GitHub: Versionamento de Código Fácil
Git e GitHub: Versionamento de Código Fácil
 
Funny stories and anti-patterns from DevOps landscape
Funny stories and anti-patterns from DevOps landscapeFunny stories and anti-patterns from DevOps landscape
Funny stories and anti-patterns from DevOps landscape
 
Improve your Product Backlog Refinement (PBR) Process
Improve your Product Backlog Refinement (PBR) ProcessImprove your Product Backlog Refinement (PBR) Process
Improve your Product Backlog Refinement (PBR) Process
 
Understanding GIT and Version Control
Understanding GIT and Version ControlUnderstanding GIT and Version Control
Understanding GIT and Version Control
 
Deloitte lean agile state of the nation
Deloitte lean   agile state of the nationDeloitte lean   agile state of the nation
Deloitte lean agile state of the nation
 
DevOps
DevOps DevOps
DevOps
 
What is Scrum?
What is Scrum?What is Scrum?
What is Scrum?
 
DevSecOps Fundamentals and the Scars to Prove it.
DevSecOps Fundamentals and the Scars to Prove it.DevSecOps Fundamentals and the Scars to Prove it.
DevSecOps Fundamentals and the Scars to Prove it.
 
CONTINUOUS INTEGRATION WITH JENKINS AND GIT
CONTINUOUS INTEGRATION WITH JENKINS AND GITCONTINUOUS INTEGRATION WITH JENKINS AND GIT
CONTINUOUS INTEGRATION WITH JENKINS AND GIT
 
DevOps
DevOpsDevOps
DevOps
 
Five Key Numbers to Gauge your Agile Engineering Efforts
Five Key Numbers to Gauge your Agile Engineering EffortsFive Key Numbers to Gauge your Agile Engineering Efforts
Five Key Numbers to Gauge your Agile Engineering Efforts
 
Devops
DevopsDevops
Devops
 
Team Topologies in action - early results from industry - DOES London Virtual...
Team Topologies in action - early results from industry - DOES London Virtual...Team Topologies in action - early results from industry - DOES London Virtual...
Team Topologies in action - early results from industry - DOES London Virtual...
 
What Is DevOps? | Introduction To DevOps | DevOps Tools | DevOps Tutorial | D...
What Is DevOps? | Introduction To DevOps | DevOps Tools | DevOps Tutorial | D...What Is DevOps? | Introduction To DevOps | DevOps Tools | DevOps Tutorial | D...
What Is DevOps? | Introduction To DevOps | DevOps Tools | DevOps Tutorial | D...
 
Getting Started - Introduction to Backlog Grooming
Getting Started - Introduction to Backlog GroomingGetting Started - Introduction to Backlog Grooming
Getting Started - Introduction to Backlog Grooming
 
Agile project discovery
Agile project discoveryAgile project discovery
Agile project discovery
 

Similar to Mastering Microservices 2022 - Debugging distributed systems

JavaLand 2022 - Debugging distributed systems
JavaLand 2022 - Debugging distributed systemsJavaLand 2022 - Debugging distributed systems
JavaLand 2022 - Debugging distributed systemsBert Jan Schrijver
 
GOTO night April 2022 - Debugging distributed systems
GOTO night April 2022 - Debugging distributed systemsGOTO night April 2022 - Debugging distributed systems
GOTO night April 2022 - Debugging distributed systemsBert Jan Schrijver
 
Devoxx Belgium 2022 - Debugging distributed systems
Devoxx Belgium 2022 - Debugging distributed systemsDevoxx Belgium 2022 - Debugging distributed systems
Devoxx Belgium 2022 - Debugging distributed systemsBert Jan Schrijver
 
Arnhem JUG March 2023 - Debugging distributed systems
Arnhem JUG March 2023 - Debugging distributed systemsArnhem JUG March 2023 - Debugging distributed systems
Arnhem JUG March 2023 - Debugging distributed systemsBert Jan Schrijver
 
JUG CH September 2021 - Debugging distributed systems
JUG CH September 2021 - Debugging distributed systemsJUG CH September 2021 - Debugging distributed systems
JUG CH September 2021 - Debugging distributed systemsBert Jan Schrijver
 
Incident Response Fails
Incident Response FailsIncident Response Fails
Incident Response FailsMichael Gough
 
WTF is Penetration Testing v.2
WTF is Penetration Testing v.2WTF is Penetration Testing v.2
WTF is Penetration Testing v.2Scott Sutherland
 
Pentesting Tips: Beyond Automated Testing
Pentesting Tips: Beyond Automated TestingPentesting Tips: Beyond Automated Testing
Pentesting Tips: Beyond Automated TestingAndrew McNicol
 
When Security Tools Fail You
When Security Tools Fail YouWhen Security Tools Fail You
When Security Tools Fail YouMichael Gough
 
FUEL_USERS_GROUP
FUEL_USERS_GROUPFUEL_USERS_GROUP
FUEL_USERS_GROUPWill Pearce
 
All These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFAll These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFMichael Gough
 
Professional Hacking in 2011
Professional Hacking in 2011Professional Hacking in 2011
Professional Hacking in 2011securityaegis
 
Monitoring microservices
Monitoring microservicesMonitoring microservices
Monitoring microservicesWilliam Brander
 
Owning windows 8 with human interface devices
Owning windows 8 with human interface devicesOwning windows 8 with human interface devices
Owning windows 8 with human interface devicesNikhil Mittal
 
From SLO to GOTY
From SLO to GOTYFrom SLO to GOTY
From SLO to GOTYScyllaDB
 
Creating Havoc using Human Interface Device
Creating Havoc using Human Interface DeviceCreating Havoc using Human Interface Device
Creating Havoc using Human Interface DevicePositive Hack Days
 
Super Easy Memory Forensics
Super Easy Memory ForensicsSuper Easy Memory Forensics
Super Easy Memory ForensicsIIJ
 

Similar to Mastering Microservices 2022 - Debugging distributed systems (20)

JavaLand 2022 - Debugging distributed systems
JavaLand 2022 - Debugging distributed systemsJavaLand 2022 - Debugging distributed systems
JavaLand 2022 - Debugging distributed systems
 
GOTO night April 2022 - Debugging distributed systems
GOTO night April 2022 - Debugging distributed systemsGOTO night April 2022 - Debugging distributed systems
GOTO night April 2022 - Debugging distributed systems
 
Debugging distributed systems
Debugging distributed systemsDebugging distributed systems
Debugging distributed systems
 
Devoxx Belgium 2022 - Debugging distributed systems
Devoxx Belgium 2022 - Debugging distributed systemsDevoxx Belgium 2022 - Debugging distributed systems
Devoxx Belgium 2022 - Debugging distributed systems
 
Arnhem JUG March 2023 - Debugging distributed systems
Arnhem JUG March 2023 - Debugging distributed systemsArnhem JUG March 2023 - Debugging distributed systems
Arnhem JUG March 2023 - Debugging distributed systems
 
Debugging distributed systems
Debugging distributed systemsDebugging distributed systems
Debugging distributed systems
 
JUG CH September 2021 - Debugging distributed systems
JUG CH September 2021 - Debugging distributed systemsJUG CH September 2021 - Debugging distributed systems
JUG CH September 2021 - Debugging distributed systems
 
Incident Response Fails
Incident Response FailsIncident Response Fails
Incident Response Fails
 
WTF is Penetration Testing v.2
WTF is Penetration Testing v.2WTF is Penetration Testing v.2
WTF is Penetration Testing v.2
 
Pentesting Tips: Beyond Automated Testing
Pentesting Tips: Beyond Automated TestingPentesting Tips: Beyond Automated Testing
Pentesting Tips: Beyond Automated Testing
 
When Security Tools Fail You
When Security Tools Fail YouWhen Security Tools Fail You
When Security Tools Fail You
 
FUEL_USERS_GROUP
FUEL_USERS_GROUPFUEL_USERS_GROUP
FUEL_USERS_GROUP
 
All These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFAll These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDF
 
Professional Hacking in 2011
Professional Hacking in 2011Professional Hacking in 2011
Professional Hacking in 2011
 
Monitoring microservices
Monitoring microservicesMonitoring microservices
Monitoring microservices
 
Heartbleed
HeartbleedHeartbleed
Heartbleed
 
Owning windows 8 with human interface devices
Owning windows 8 with human interface devicesOwning windows 8 with human interface devices
Owning windows 8 with human interface devices
 
From SLO to GOTY
From SLO to GOTYFrom SLO to GOTY
From SLO to GOTY
 
Creating Havoc using Human Interface Device
Creating Havoc using Human Interface DeviceCreating Havoc using Human Interface Device
Creating Havoc using Human Interface Device
 
Super Easy Memory Forensics
Super Easy Memory ForensicsSuper Easy Memory Forensics
Super Easy Memory Forensics
 

Recently uploaded

Buds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in NoidaBuds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in Noidabntitsolutionsrishis
 
Xen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdfXen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdfStefano Stabellini
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackVICTOR MAESTRE RAMIREZ
 
英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作qr0udbr0
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样umasea
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfAlina Yurenko
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024StefanoLambiase
 
Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Velvetech LLC
 
Introduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfIntroduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfFerryKemperman
 
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)jennyeacort
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)OPEN KNOWLEDGE GmbH
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureDinusha Kumarasiri
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEEVICTOR MAESTRE RAMIREZ
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxTier1 app
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtimeandrehoraa
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...OnePlan Solutions
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Andreas Granig
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEOrtus Solutions, Corp
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideChristina Lin
 

Recently uploaded (20)

Buds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in NoidaBuds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in Noida
 
Xen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdfXen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdf
 
2.pdf Ejercicios de programación competitiva
2.pdf Ejercicios de programación competitiva2.pdf Ejercicios de programación competitiva
2.pdf Ejercicios de programación competitiva
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStack
 
英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
 
Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...
 
Introduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfIntroduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdf
 
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with Azure
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEE
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtime
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
 

Mastering Microservices 2022 - Debugging distributed systems

  • 2. Debugging distributed systems: the good parts bertjan@openvalue.eu Bert Jan Schrijver @bjschrijver Networking 101 How the internet works
  • 4. Bert Jan Schrijver L e t ’ s m e e t @bjschrijver
  • 5. Why are distributed systems difficult? Networking 101 What? Why? ✅ Demo War stories Conclusion W h a t ‘ s n e x t ? Outline A structured approach @bjschrijver
  • 6. What is a distributed system?
  • 7. A distributed system is a system whose components are located on different networked computers, which communicate and coordinate their actions by passing messages to one another.
  • 8. • Concurrency of components • Lack of a global clock • Independent failure of components 
 • Distributed systems are harder to reason about Characteristics of distributed systems Source: http://www.nasa.gov/images/content/218652main_STOCC_FS_img_lg.jpg
  • 9. Working with distributed systems is fundamentally different from writing software on a single computer - and the main difference is that there are lots of new and exciting ways for things to go wrong. - Martin Kleppmann “ ” Photo: Dave Lehl
  • 10. What could possibly go wrong? “ ” Photo: Dave Lehl
  • 11. OSI & TCP/IP Source: https://www.guru99.com/difference-tcp-ip-vs-osi-model.html
  • 12. .. in your browser’s address bar and press Enter What happens when you type google.com… Source: https://github.com/alex/what-happens-when
  • 13. 13
  • 15. Step 1: Observe & document • What do you know about the problem? • Inspect logging, errors, metrics, tracing • Draw the path from source to target - what’s in between? Focus on details! • Document what you know • Can we reproduce in a test? • By injecting errors, for example Tools Whiteboard, documentation, logging, metrics, tracing (opentracing.io), tests, Jepsen
  • 16. Step 1: Observe & document
  • 17. Step 2: Create minimal reproducer • Goal: maximise the amount of debugging cycles • Focus on short development iterations / feedback loops • Get close to the action! Tools IDE, Shell scripts, SSH tunnels, Curl
  • 18. Step 3: Debug client side • Focus on eliminating anything that could be wrong on the client side • Are we connecting to the right host? • Do we send the right message? • Do we receive a response? • Not much different from local 
 debugging Tools IDE, debugger, logging
  • 19. Step 4: Check DNS & routing • DNS: • Make sure you know what IP address the hostname should resolve to • Verify that this actually happens 
 at the client • Routing: • Verify you can reach the 
 target machine Tools host, nslookup, dig, whois, ping, traceroute, nslookup.io, dnschecker.org
  • 20. Step 5: Check connection • Can we connect to the port? • If not, do we get a REJECT or a DROP? • Does the connection open and stay open? • Are we talking TLS? • What is the connection speed 
 between us? Tools telnet, nc, curl, iperf
  • 21. Step 6: Inspect traffic / messages • Do we send the right request? • Do we receive the right response? • How do we know? • How do we handle TLS? • Are there any load balancers 
 or proxies in between? Tools curl, wireshark, tcpdump, network tab in browser, mitm/tls proxy
  • 22. Step 7: Debug server side • Inspect the remote host • Can we attach a remote debugger? • See https://youtube.com/OpenValue • Profiling • Strace • Java Flight Recorder (JFR) 
 Tools SSH tunnels, remote debugger, profiler, strace, Java Flight Recorder
  • 23. Step 8: Wrap up & post mortem • Document the issue: • Timeline • What did we see? • Why did it happen? • What was the impact? • How did we find out? • What did we do to mitigate and fix? • What should we do to prevent 
 repetition? Tools Whiteboard, documentation
  • 24. If you really want a reliable system, you have to understand what its failure modes are. You have to actually have witnessed it misbehaving. - Jason Cahoon “ ”
  • 26. The time where it worked half of the time…
  • 27. The one at a school…
  • 28.
  • 29. The one with breaking news…
  • 30. Summary: a structured approach to debugging distributed systems @bjschrijver Check DNS & routing Check connection Debug client side Create minimal reproducer Debug server side Observe & document Wrap up & post mortem Inspect traffic / messages
  • 33. Thanks for your time. Got feedback? Tweet it! All pictures belong to their respective authors @bjschrijver