SlideShare a Scribd company logo
bertjan@openvalue.eu
Debugging distributed systems
Bert Jan Schrijver
@bjschrijver
Debugging distributed systems: the good parts
bertjan@openvalue.eu
Bert Jan Schrijver
@bjschrijver
Networking 101
How the internet works
Why?
Bert Jan Schrijver
L e t ’ s m e e t
@bjschrijver
Why are distributed
systems difficult?
Networking 101
What?
Why?
✅
Demo
War stories
Conclusion
W h a t ‘ s n e x t ?
Outline
A structured approach
@bjschrijver
What is a distributed system?
A distributed system is a system whose
components are located on different
networked computers, which
communicate and coordinate their actions
by passing messages to one another.
• Concurrency of components


• Lack of a global clock


• Independent failure of components


• Distributed systems are harder to reason
about
Characteristics of distributed systems
Source: http://www.nasa.gov/images/content/218652main_STOCC_FS_img_lg.jpg
Working with distributed systems is
fundamentally different from writing
software on a single computer - and the
main difference is that there are lots of new
and exciting ways for things to go wrong.


- Martin Kleppmann
“
”
Photo: Dave Lehl
What could possibly go wrong?
“ ”
Photo: Dave Lehl
OSI & TCP/IP
Source: https://www.guru99.com/difference-tcp-ip-vs-osi-model.html
.. in your browser’s address bar and press Enter
What happens when you type google.com…
Source: https://github.com/alex/what-happens-when
13
Source: https://7216-presscdn-0-76-pagely.netdna-ssl.com/wp-content/uploads/2011/12/confused-man-single-good-men.jpg
Where do I start?
Step 1: Observe & document
• What do you know about the problem?


• Inspect logging, errors, metrics, tracing


• Draw the path from source to target - what’s
in between? Focus on details!


• Document what you know


• Can we reproduce in a test?


• By injecting errors, for example
Tools


Whiteboard,
documentation, logging,
metrics, tracing
(opentracing.io), tests,
Jepsen
Step 1: Observe & document
Step 2: Create minimal reproducer
• Goal: maximise the amount of debugging
cycles


• Focus on short development iterations /
feedback loops


• Get close to the action!
Tools


IDE, Shell scripts,


SSH tunnels, Curl
Step 3: Debug client side
• Focus on eliminating anything that could be
wrong on the client side


• Are we connecting to the right host?


• Do we send the right message?


• Do we receive a response?


• Not much different from local


debugging
Tools


IDE, debugger,
logging
Step 4: Check DNS & routing
• DNS:


• Make sure you know what IP address the
hostname should resolve to


• Verify that this actually happens


at the client


• Routing:


• Verify you can reach the


target machine
Tools


host, nslookup,
dig, whois, ping,
traceroute,
nslookup.io,
dnschecker.org
Step 5: Check connection
• Can we connect to the port?


• If not, do we get a REJECT or a DROP?


• Does the connection open and stay open?


• Are we talking TLS?


• What is the connection speed


between us?
Tools


telnet, nc, curl,
iperf
Step 6: Inspect traffic / messages
• Do we send the right request?


• Do we receive the right response?


• How do we know?


• How do we handle TLS?


• Are there any load balancers


or proxies in between?
Tools


curl, wireshark,
tcpdump, network
tab in browser,
mitm/tls proxy
Step 7: Debug server side
• Inspect the remote host


• Can we attach a remote debugger?


• See https://youtube.com/OpenValue


• Profiling


• Strace Tools


SSH tunnels,
remote debugger,
profiler, strace
Step 8: Wrap up & post mortem
• Document the issue:


• Timeline


• What did we see?


• Why did it happen?


• What was the impact?


• How did we find out?


• What did we do to mitigate and fix?


• What should we do to prevent


repetition?
Tools


Whiteboard,
documentation
If you really want a reliable system, you
have to understand what its failure modes
are. You have to actually have witnessed
it misbehaving.


- Jason Cahoon
“
”
Distributed systems war stories
The time where it worked half of the time…
The one at a school…
The one with breaking news…
Summary: a structured approach
to debugging distributed systems
@bjschrijver
Check DNS & routing
Check connection
Debug client side
Create minimal reproducer
Debug server side
Observe & document
Wrap up & post mortem
Inspect traffic / messages
Source: https://cdn2.vox-cdn.com/thumbor/J9OqPYS7FgI9fjGhnF7AFh8foVY=/148x0:1768x1080/1280x854/cdn0.vox-cdn.com/uploads/chorus_image/image/46147742/cute-success-kid-1920x1080.0.0.jpg
THAT’S IT.


NOW GO KICK SOME ASS!
Questions?


@bjschrijver
Thanks for your time.
Got feedback? Tweet it!
All pictures belong


to their respective


authors
@bjschrijver

More Related Content

Similar to JavaLand 2022 - Debugging distributed systems

Pentesting Tips: Beyond Automated Testing
Pentesting Tips: Beyond Automated TestingPentesting Tips: Beyond Automated Testing
Pentesting Tips: Beyond Automated Testing
Andrew McNicol
 
Monitoring microservices
Monitoring microservicesMonitoring microservices
Monitoring microservices
William Brander
 
When Security Tools Fail You
When Security Tools Fail YouWhen Security Tools Fail You
When Security Tools Fail You
Michael Gough
 
Blockchain and Hook model of engagement
Blockchain and Hook model of engagement Blockchain and Hook model of engagement
Blockchain and Hook model of engagement
Rajeev Soni
 
All These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFAll These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDF
Michael Gough
 
Professional Hacking in 2011
Professional Hacking in 2011Professional Hacking in 2011
Professional Hacking in 2011
securityaegis
 
FUEL_USERS_GROUP
FUEL_USERS_GROUPFUEL_USERS_GROUP
FUEL_USERS_GROUPWill Pearce
 
Heartbleed
HeartbleedHeartbleed
Heartbleed
Punit Goswami
 
Owning windows 8 with human interface devices
Owning windows 8 with human interface devicesOwning windows 8 with human interface devices
Owning windows 8 with human interface devices
Nikhil Mittal
 
Rewriting DevOps
Rewriting DevOpsRewriting DevOps
Rewriting DevOps
Matthew Boeckman
 
THOTCON 0x6: Going Kinetic on Electronic Crime Networks
THOTCON 0x6: Going Kinetic on Electronic Crime NetworksTHOTCON 0x6: Going Kinetic on Electronic Crime Networks
THOTCON 0x6: Going Kinetic on Electronic Crime Networks
John Bambenek
 
From SLO to GOTY
From SLO to GOTYFrom SLO to GOTY
From SLO to GOTY
ScyllaDB
 
Chapter 15 incident handling
Chapter 15 incident handlingChapter 15 incident handling
Chapter 15 incident handling
newbie2019
 
2023 NCIT: Introduction to Intrusion Detection
2023 NCIT: Introduction to Intrusion Detection2023 NCIT: Introduction to Intrusion Detection
2023 NCIT: Introduction to Intrusion Detection
APNIC
 
Architecting a Post Mortem - Velocity 2018 San Jose Tutorial
Architecting a Post Mortem - Velocity 2018 San Jose TutorialArchitecting a Post Mortem - Velocity 2018 San Jose Tutorial
Architecting a Post Mortem - Velocity 2018 San Jose Tutorial
Will Gallego
 
Luncheon 2016-07-16 - Topic 2 - Advanced Threat Hunting by Justin Falck
Luncheon 2016-07-16 -  Topic 2 - Advanced Threat Hunting by Justin FalckLuncheon 2016-07-16 -  Topic 2 - Advanced Threat Hunting by Justin Falck
Luncheon 2016-07-16 - Topic 2 - Advanced Threat Hunting by Justin Falck
North Texas Chapter of the ISSA
 
Heartbleed Bug Vulnerability: Discovery, Impact and Solution
Heartbleed Bug Vulnerability: Discovery, Impact and SolutionHeartbleed Bug Vulnerability: Discovery, Impact and Solution
Heartbleed Bug Vulnerability: Discovery, Impact and Solution
CASCouncil
 
Debugging under fire: Keeping your head when systems have lost their mind
Debugging under fire: Keeping your head when systems have lost their mindDebugging under fire: Keeping your head when systems have lost their mind
Debugging under fire: Keeping your head when systems have lost their mind
bcantrill
 
Jax Devops 2017 Succeeding in the Cloud – the guidebook of Fail
Jax Devops 2017  Succeeding in the Cloud – the guidebook of FailJax Devops 2017  Succeeding in the Cloud – the guidebook of Fail
Jax Devops 2017 Succeeding in the Cloud – the guidebook of Fail
Steve Poole
 
Super Easy Memory Forensics
Super Easy Memory ForensicsSuper Easy Memory Forensics
Super Easy Memory Forensics
IIJ
 

Similar to JavaLand 2022 - Debugging distributed systems (20)

Pentesting Tips: Beyond Automated Testing
Pentesting Tips: Beyond Automated TestingPentesting Tips: Beyond Automated Testing
Pentesting Tips: Beyond Automated Testing
 
Monitoring microservices
Monitoring microservicesMonitoring microservices
Monitoring microservices
 
When Security Tools Fail You
When Security Tools Fail YouWhen Security Tools Fail You
When Security Tools Fail You
 
Blockchain and Hook model of engagement
Blockchain and Hook model of engagement Blockchain and Hook model of engagement
Blockchain and Hook model of engagement
 
All These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFAll These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDF
 
Professional Hacking in 2011
Professional Hacking in 2011Professional Hacking in 2011
Professional Hacking in 2011
 
FUEL_USERS_GROUP
FUEL_USERS_GROUPFUEL_USERS_GROUP
FUEL_USERS_GROUP
 
Heartbleed
HeartbleedHeartbleed
Heartbleed
 
Owning windows 8 with human interface devices
Owning windows 8 with human interface devicesOwning windows 8 with human interface devices
Owning windows 8 with human interface devices
 
Rewriting DevOps
Rewriting DevOpsRewriting DevOps
Rewriting DevOps
 
THOTCON 0x6: Going Kinetic on Electronic Crime Networks
THOTCON 0x6: Going Kinetic on Electronic Crime NetworksTHOTCON 0x6: Going Kinetic on Electronic Crime Networks
THOTCON 0x6: Going Kinetic on Electronic Crime Networks
 
From SLO to GOTY
From SLO to GOTYFrom SLO to GOTY
From SLO to GOTY
 
Chapter 15 incident handling
Chapter 15 incident handlingChapter 15 incident handling
Chapter 15 incident handling
 
2023 NCIT: Introduction to Intrusion Detection
2023 NCIT: Introduction to Intrusion Detection2023 NCIT: Introduction to Intrusion Detection
2023 NCIT: Introduction to Intrusion Detection
 
Architecting a Post Mortem - Velocity 2018 San Jose Tutorial
Architecting a Post Mortem - Velocity 2018 San Jose TutorialArchitecting a Post Mortem - Velocity 2018 San Jose Tutorial
Architecting a Post Mortem - Velocity 2018 San Jose Tutorial
 
Luncheon 2016-07-16 - Topic 2 - Advanced Threat Hunting by Justin Falck
Luncheon 2016-07-16 -  Topic 2 - Advanced Threat Hunting by Justin FalckLuncheon 2016-07-16 -  Topic 2 - Advanced Threat Hunting by Justin Falck
Luncheon 2016-07-16 - Topic 2 - Advanced Threat Hunting by Justin Falck
 
Heartbleed Bug Vulnerability: Discovery, Impact and Solution
Heartbleed Bug Vulnerability: Discovery, Impact and SolutionHeartbleed Bug Vulnerability: Discovery, Impact and Solution
Heartbleed Bug Vulnerability: Discovery, Impact and Solution
 
Debugging under fire: Keeping your head when systems have lost their mind
Debugging under fire: Keeping your head when systems have lost their mindDebugging under fire: Keeping your head when systems have lost their mind
Debugging under fire: Keeping your head when systems have lost their mind
 
Jax Devops 2017 Succeeding in the Cloud – the guidebook of Fail
Jax Devops 2017  Succeeding in the Cloud – the guidebook of FailJax Devops 2017  Succeeding in the Cloud – the guidebook of Fail
Jax Devops 2017 Succeeding in the Cloud – the guidebook of Fail
 
Super Easy Memory Forensics
Super Easy Memory ForensicsSuper Easy Memory Forensics
Super Easy Memory Forensics
 

Recently uploaded

原版仿制(uob毕业证书)英国伯明翰大学毕业证本科学历证书原版一模一样
原版仿制(uob毕业证书)英国伯明翰大学毕业证本科学历证书原版一模一样原版仿制(uob毕业证书)英国伯明翰大学毕业证本科学历证书原版一模一样
原版仿制(uob毕业证书)英国伯明翰大学毕业证本科学历证书原版一模一样
3ipehhoa
 
Internet of Things in Manufacturing: Revolutionizing Efficiency & Quality | C...
Internet of Things in Manufacturing: Revolutionizing Efficiency & Quality | C...Internet of Things in Manufacturing: Revolutionizing Efficiency & Quality | C...
Internet of Things in Manufacturing: Revolutionizing Efficiency & Quality | C...
CIOWomenMagazine
 
History+of+E-commerce+Development+in+China-www.cfye-commerce.shop
History+of+E-commerce+Development+in+China-www.cfye-commerce.shopHistory+of+E-commerce+Development+in+China-www.cfye-commerce.shop
History+of+E-commerce+Development+in+China-www.cfye-commerce.shop
laozhuseo02
 
test test test test testtest test testtest test testtest test testtest test ...
test test  test test testtest test testtest test testtest test testtest test ...test test  test test testtest test testtest test testtest test testtest test ...
test test test test testtest test testtest test testtest test testtest test ...
Arif0071
 
1比1复刻(bath毕业证书)英国巴斯大学毕业证学位证原版一模一样
1比1复刻(bath毕业证书)英国巴斯大学毕业证学位证原版一模一样1比1复刻(bath毕业证书)英国巴斯大学毕业证学位证原版一模一样
1比1复刻(bath毕业证书)英国巴斯大学毕业证学位证原版一模一样
3ipehhoa
 
How to Use Contact Form 7 Like a Pro.pptx
How to Use Contact Form 7 Like a Pro.pptxHow to Use Contact Form 7 Like a Pro.pptx
How to Use Contact Form 7 Like a Pro.pptx
Gal Baras
 
JAVIER LASA-EXPERIENCIA digital 1986-2024.pdf
JAVIER LASA-EXPERIENCIA digital 1986-2024.pdfJAVIER LASA-EXPERIENCIA digital 1986-2024.pdf
JAVIER LASA-EXPERIENCIA digital 1986-2024.pdf
Javier Lasa
 
guildmasters guide to ravnica Dungeons & Dragons 5...
guildmasters guide to ravnica Dungeons & Dragons 5...guildmasters guide to ravnica Dungeons & Dragons 5...
guildmasters guide to ravnica Dungeons & Dragons 5...
Rogerio Filho
 
This 7-second Brain Wave Ritual Attracts Money To You.!
This 7-second Brain Wave Ritual Attracts Money To You.!This 7-second Brain Wave Ritual Attracts Money To You.!
This 7-second Brain Wave Ritual Attracts Money To You.!
nirahealhty
 
1.Wireless Communication System_Wireless communication is a broad term that i...
1.Wireless Communication System_Wireless communication is a broad term that i...1.Wireless Communication System_Wireless communication is a broad term that i...
1.Wireless Communication System_Wireless communication is a broad term that i...
JeyaPerumal1
 
Bài tập unit 1 English in the world.docx
Bài tập unit 1 English in the world.docxBài tập unit 1 English in the world.docx
Bài tập unit 1 English in the world.docx
nhiyenphan2005
 
一比一原版(CSU毕业证)加利福尼亚州立大学毕业证成绩单专业办理
一比一原版(CSU毕业证)加利福尼亚州立大学毕业证成绩单专业办理一比一原版(CSU毕业证)加利福尼亚州立大学毕业证成绩单专业办理
一比一原版(CSU毕业证)加利福尼亚州立大学毕业证成绩单专业办理
ufdana
 
急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样
急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样
急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样
3ipehhoa
 
一比一原版(SLU毕业证)圣路易斯大学毕业证成绩单专业办理
一比一原版(SLU毕业证)圣路易斯大学毕业证成绩单专业办理一比一原版(SLU毕业证)圣路易斯大学毕业证成绩单专业办理
一比一原版(SLU毕业证)圣路易斯大学毕业证成绩单专业办理
keoku
 
一比一原版(LBS毕业证)伦敦商学院毕业证成绩单专业办理
一比一原版(LBS毕业证)伦敦商学院毕业证成绩单专业办理一比一原版(LBS毕业证)伦敦商学院毕业证成绩单专业办理
一比一原版(LBS毕业证)伦敦商学院毕业证成绩单专业办理
eutxy
 
Latest trends in computer networking.pptx
Latest trends in computer networking.pptxLatest trends in computer networking.pptx
Latest trends in computer networking.pptx
JungkooksNonexistent
 
Comptia N+ Standard Networking lesson guide
Comptia N+ Standard Networking lesson guideComptia N+ Standard Networking lesson guide
Comptia N+ Standard Networking lesson guide
GTProductions1
 
Bridging the Digital Gap Brad Spiegel Macon, GA Initiative.pptx
Bridging the Digital Gap Brad Spiegel Macon, GA Initiative.pptxBridging the Digital Gap Brad Spiegel Macon, GA Initiative.pptx
Bridging the Digital Gap Brad Spiegel Macon, GA Initiative.pptx
Brad Spiegel Macon GA
 
APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024
APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024
APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024
APNIC
 
The+Prospects+of+E-Commerce+in+China.pptx
The+Prospects+of+E-Commerce+in+China.pptxThe+Prospects+of+E-Commerce+in+China.pptx
The+Prospects+of+E-Commerce+in+China.pptx
laozhuseo02
 

Recently uploaded (20)

原版仿制(uob毕业证书)英国伯明翰大学毕业证本科学历证书原版一模一样
原版仿制(uob毕业证书)英国伯明翰大学毕业证本科学历证书原版一模一样原版仿制(uob毕业证书)英国伯明翰大学毕业证本科学历证书原版一模一样
原版仿制(uob毕业证书)英国伯明翰大学毕业证本科学历证书原版一模一样
 
Internet of Things in Manufacturing: Revolutionizing Efficiency & Quality | C...
Internet of Things in Manufacturing: Revolutionizing Efficiency & Quality | C...Internet of Things in Manufacturing: Revolutionizing Efficiency & Quality | C...
Internet of Things in Manufacturing: Revolutionizing Efficiency & Quality | C...
 
History+of+E-commerce+Development+in+China-www.cfye-commerce.shop
History+of+E-commerce+Development+in+China-www.cfye-commerce.shopHistory+of+E-commerce+Development+in+China-www.cfye-commerce.shop
History+of+E-commerce+Development+in+China-www.cfye-commerce.shop
 
test test test test testtest test testtest test testtest test testtest test ...
test test  test test testtest test testtest test testtest test testtest test ...test test  test test testtest test testtest test testtest test testtest test ...
test test test test testtest test testtest test testtest test testtest test ...
 
1比1复刻(bath毕业证书)英国巴斯大学毕业证学位证原版一模一样
1比1复刻(bath毕业证书)英国巴斯大学毕业证学位证原版一模一样1比1复刻(bath毕业证书)英国巴斯大学毕业证学位证原版一模一样
1比1复刻(bath毕业证书)英国巴斯大学毕业证学位证原版一模一样
 
How to Use Contact Form 7 Like a Pro.pptx
How to Use Contact Form 7 Like a Pro.pptxHow to Use Contact Form 7 Like a Pro.pptx
How to Use Contact Form 7 Like a Pro.pptx
 
JAVIER LASA-EXPERIENCIA digital 1986-2024.pdf
JAVIER LASA-EXPERIENCIA digital 1986-2024.pdfJAVIER LASA-EXPERIENCIA digital 1986-2024.pdf
JAVIER LASA-EXPERIENCIA digital 1986-2024.pdf
 
guildmasters guide to ravnica Dungeons & Dragons 5...
guildmasters guide to ravnica Dungeons & Dragons 5...guildmasters guide to ravnica Dungeons & Dragons 5...
guildmasters guide to ravnica Dungeons & Dragons 5...
 
This 7-second Brain Wave Ritual Attracts Money To You.!
This 7-second Brain Wave Ritual Attracts Money To You.!This 7-second Brain Wave Ritual Attracts Money To You.!
This 7-second Brain Wave Ritual Attracts Money To You.!
 
1.Wireless Communication System_Wireless communication is a broad term that i...
1.Wireless Communication System_Wireless communication is a broad term that i...1.Wireless Communication System_Wireless communication is a broad term that i...
1.Wireless Communication System_Wireless communication is a broad term that i...
 
Bài tập unit 1 English in the world.docx
Bài tập unit 1 English in the world.docxBài tập unit 1 English in the world.docx
Bài tập unit 1 English in the world.docx
 
一比一原版(CSU毕业证)加利福尼亚州立大学毕业证成绩单专业办理
一比一原版(CSU毕业证)加利福尼亚州立大学毕业证成绩单专业办理一比一原版(CSU毕业证)加利福尼亚州立大学毕业证成绩单专业办理
一比一原版(CSU毕业证)加利福尼亚州立大学毕业证成绩单专业办理
 
急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样
急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样
急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样
 
一比一原版(SLU毕业证)圣路易斯大学毕业证成绩单专业办理
一比一原版(SLU毕业证)圣路易斯大学毕业证成绩单专业办理一比一原版(SLU毕业证)圣路易斯大学毕业证成绩单专业办理
一比一原版(SLU毕业证)圣路易斯大学毕业证成绩单专业办理
 
一比一原版(LBS毕业证)伦敦商学院毕业证成绩单专业办理
一比一原版(LBS毕业证)伦敦商学院毕业证成绩单专业办理一比一原版(LBS毕业证)伦敦商学院毕业证成绩单专业办理
一比一原版(LBS毕业证)伦敦商学院毕业证成绩单专业办理
 
Latest trends in computer networking.pptx
Latest trends in computer networking.pptxLatest trends in computer networking.pptx
Latest trends in computer networking.pptx
 
Comptia N+ Standard Networking lesson guide
Comptia N+ Standard Networking lesson guideComptia N+ Standard Networking lesson guide
Comptia N+ Standard Networking lesson guide
 
Bridging the Digital Gap Brad Spiegel Macon, GA Initiative.pptx
Bridging the Digital Gap Brad Spiegel Macon, GA Initiative.pptxBridging the Digital Gap Brad Spiegel Macon, GA Initiative.pptx
Bridging the Digital Gap Brad Spiegel Macon, GA Initiative.pptx
 
APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024
APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024
APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024
 
The+Prospects+of+E-Commerce+in+China.pptx
The+Prospects+of+E-Commerce+in+China.pptxThe+Prospects+of+E-Commerce+in+China.pptx
The+Prospects+of+E-Commerce+in+China.pptx
 

JavaLand 2022 - Debugging distributed systems

  • 2. Debugging distributed systems: the good parts bertjan@openvalue.eu Bert Jan Schrijver @bjschrijver Networking 101 How the internet works
  • 4. Bert Jan Schrijver L e t ’ s m e e t @bjschrijver
  • 5. Why are distributed systems difficult? Networking 101 What? Why? ✅ Demo War stories Conclusion W h a t ‘ s n e x t ? Outline A structured approach @bjschrijver
  • 6. What is a distributed system?
  • 7. A distributed system is a system whose components are located on different networked computers, which communicate and coordinate their actions by passing messages to one another.
  • 8. • Concurrency of components • Lack of a global clock • Independent failure of components 
 • Distributed systems are harder to reason about Characteristics of distributed systems Source: http://www.nasa.gov/images/content/218652main_STOCC_FS_img_lg.jpg
  • 9. Working with distributed systems is fundamentally different from writing software on a single computer - and the main difference is that there are lots of new and exciting ways for things to go wrong. - Martin Kleppmann “ ” Photo: Dave Lehl
  • 10. What could possibly go wrong? “ ” Photo: Dave Lehl
  • 11. OSI & TCP/IP Source: https://www.guru99.com/difference-tcp-ip-vs-osi-model.html
  • 12. .. in your browser’s address bar and press Enter What happens when you type google.com… Source: https://github.com/alex/what-happens-when
  • 13. 13
  • 15. Step 1: Observe & document • What do you know about the problem? • Inspect logging, errors, metrics, tracing • Draw the path from source to target - what’s in between? Focus on details! • Document what you know • Can we reproduce in a test? • By injecting errors, for example Tools Whiteboard, documentation, logging, metrics, tracing (opentracing.io), tests, Jepsen
  • 16. Step 1: Observe & document
  • 17. Step 2: Create minimal reproducer • Goal: maximise the amount of debugging cycles • Focus on short development iterations / feedback loops • Get close to the action! Tools IDE, Shell scripts, SSH tunnels, Curl
  • 18. Step 3: Debug client side • Focus on eliminating anything that could be wrong on the client side • Are we connecting to the right host? • Do we send the right message? • Do we receive a response? • Not much different from local 
 debugging Tools IDE, debugger, logging
  • 19. Step 4: Check DNS & routing • DNS: • Make sure you know what IP address the hostname should resolve to • Verify that this actually happens 
 at the client • Routing: • Verify you can reach the 
 target machine Tools host, nslookup, dig, whois, ping, traceroute, nslookup.io, dnschecker.org
  • 20. Step 5: Check connection • Can we connect to the port? • If not, do we get a REJECT or a DROP? • Does the connection open and stay open? • Are we talking TLS? • What is the connection speed 
 between us? Tools telnet, nc, curl, iperf
  • 21. Step 6: Inspect traffic / messages • Do we send the right request? • Do we receive the right response? • How do we know? • How do we handle TLS? • Are there any load balancers 
 or proxies in between? Tools curl, wireshark, tcpdump, network tab in browser, mitm/tls proxy
  • 22. Step 7: Debug server side • Inspect the remote host • Can we attach a remote debugger? • See https://youtube.com/OpenValue • Profiling • Strace Tools SSH tunnels, remote debugger, profiler, strace
  • 23. Step 8: Wrap up & post mortem • Document the issue: • Timeline • What did we see? • Why did it happen? • What was the impact? • How did we find out? • What did we do to mitigate and fix? • What should we do to prevent 
 repetition? Tools Whiteboard, documentation
  • 24. If you really want a reliable system, you have to understand what its failure modes are. You have to actually have witnessed it misbehaving. - Jason Cahoon “ ”
  • 26. The time where it worked half of the time…
  • 27. The one at a school…
  • 28.
  • 29. The one with breaking news…
  • 30. Summary: a structured approach to debugging distributed systems @bjschrijver Check DNS & routing Check connection Debug client side Create minimal reproducer Debug server side Observe & document Wrap up & post mortem Inspect traffic / messages
  • 33. Thanks for your time. Got feedback? Tweet it! All pictures belong to their respective authors @bjschrijver