SlideShare a Scribd company logo
ShipItCon 2022
#sic2022
Your application
has died
and that’s OK!
Hello!
I am Anton Whalley
Partner Technical Specialist
Node.js Diagnostics Member – llnode maintainer
IBM Strategic Open Source Committer
Rust Dublin Co-Organiser
Certified Kubernetes Administrator
3
A (very) short
history of crash
analysis
Magnetic Cores created Mid 1950s
First Deployed by MIT/US Navy
Commercialised IBM 770
4
Modern Crash Analysis
5
Executable
Running Process
Terminated Process
Load
Abort
Snapshot
E.G gcore
Core Dump
*nix
Mini Dump
Windows
Runtime Specific
Reporting
WinDbg
gcc, lldb
Custom Tool
Files
Joyee Cheung – Bring Javascript Back To Life https://www.youtube.com/watch?v=XQIo9knnb2s
Why Crash Analysis
● Locked Down Production Environment
○ (Should be) Restricted Access
○ Debug tools not available
○ Heisenbugs
○ Unoptimized Builds
● Issues with logs
○ Only captures known/knowns
○ Requires adhoc updates
○ Stacktraces Don’t capture the full state
6
David Pacheco - https://dl.acm.org/doi/10.1145/2039359.2039361
Failure Types
7
Implicit
Explicit
Type Error
Uncaught Exception
Segmentation Fault
Panic
Hardware Error
Assertion Failure
Process Abort
Error Code Exit
Incorrect Result
Leaks Resources
Stops Doing Work
Pathological
Fatal
Non-fatal
Error Message
Returns Error Code
Bryan Cantrill – Docker in Production https://www.youtube.com/watch?v=AdMqCUhvRz8
CNCF Project Coverage
8
Implicit
Explicit
Fatal
Non-fatal
The birth place
of site reliability
engineering?
HMS Sailsbury 1747
First Controlled Experiment
Identified a cure for scurvy
40yrs to be adopted
In 1780 1457 admissions;
in 1806 there were 2.
9
The return of
scurvy!
HMS Alert 1875
85% of crew succumbed
Lemons changed for limes - 1860s
Misunderstood causes
The rise of steam engines meant
shorter trips
10
How We Forget
● Improvements in Adjacent Technology
● Questionable Refinements in Approach
● Solutions are not Accessible
● Concepts with Unbalanced weighting
● The Generation Gap
11
Crash Analysis in K8s
● Core Dump Handler
● Open Source
● Cloud Agnostic - xKS
● From 12 Separate
organisations contribute
12
https://github.com/IBM/core-dump-handler/
13
On Demand Crash Analysis
Augmented Pods with IDE
Integrated Into Git workflow
DX Infrastructure – gitpod.io
https://venshare.com/blog/gitops-coredump/
Future Work
14
- Automated Analysis
- Remove Sensitive Data
- Cloud Events
15
Thanks!
For Listening
You can find me at @dhight9 or @No9 GitHub
https://openjsf.org/blog/

More Related Content

Similar to Your application has died and that's OK

Advanced Debugging Techniques
Advanced Debugging TechniquesAdvanced Debugging Techniques
Advanced Debugging Techniques
David Szöke
 
Android Platform Debugging and Development
Android Platform Debugging and DevelopmentAndroid Platform Debugging and Development
Android Platform Debugging and Development
Opersys inc.
 
Критика "библиотечного" подхода в разработке под Android. UA Mobile 2016.
Критика "библиотечного" подхода в разработке под Android. UA Mobile 2016.Критика "библиотечного" подхода в разработке под Android. UA Mobile 2016.
Критика "библиотечного" подхода в разработке под Android. UA Mobile 2016.
UA Mobile
 
Creating a reasonable project boilerplate
Creating a reasonable project boilerplateCreating a reasonable project boilerplate
Creating a reasonable project boilerplate
Stanislav Petrov
 
Alexander Kutsan: “C++ compilation boost”
Alexander Kutsan: “C++ compilation boost” Alexander Kutsan: “C++ compilation boost”
Alexander Kutsan: “C++ compilation boost”
LogeekNightUkraine
 
Software maintenance PyConUK 2016
Software maintenance PyConUK 2016Software maintenance PyConUK 2016
Software maintenance PyConUK 2016
Cesar Cardenas Desales
 
Headless Android at AnDevCon3
Headless Android at AnDevCon3Headless Android at AnDevCon3
Headless Android at AnDevCon3
Opersys inc.
 
GeoServer Developers Workshop
GeoServer Developers WorkshopGeoServer Developers Workshop
GeoServer Developers Workshop
Jody Garnett
 
Why the yocto project for my io t project elc_edinburgh_2018
Why the yocto project for my io t project elc_edinburgh_2018Why the yocto project for my io t project elc_edinburgh_2018
Why the yocto project for my io t project elc_edinburgh_2018
Mender.io
 
StudioSL Presentation in Grenoble 2011
StudioSL Presentation in Grenoble 2011StudioSL Presentation in Grenoble 2011
StudioSL Presentation in Grenoble 2011
Enrico Scantamburlo
 
Real-World Docker: 10 Things We've Learned
Real-World Docker: 10 Things We've Learned  Real-World Docker: 10 Things We've Learned
Real-World Docker: 10 Things We've Learned
RightScale
 
Introduction to react native @ TIC NUST
Introduction to react native @ TIC NUSTIntroduction to react native @ TIC NUST
Introduction to react native @ TIC NUST
Waqqas Jabbar
 
Android Platform Debugging and Development
Android Platform Debugging and DevelopmentAndroid Platform Debugging and Development
Android Platform Debugging and Development
Opersys inc.
 
Troubleshooting .net core on linux
Troubleshooting .net core on linuxTroubleshooting .net core on linux
Troubleshooting .net core on linux
Pavel Klimiankou
 
Android Platform Debugging and Development
Android Platform Debugging and DevelopmentAndroid Platform Debugging and Development
Android Platform Debugging and Development
Opersys inc.
 
Ext GWT - Overview and Implementation Case Study
Ext GWT - Overview and Implementation Case StudyExt GWT - Overview and Implementation Case Study
Ext GWT - Overview and Implementation Case Study
Avi Perez
 
Perspectives on Docker
Perspectives on DockerPerspectives on Docker
Perspectives on Docker
RightScale
 
Webinar - Unbox GitLab CI/CD
Webinar - Unbox GitLab CI/CD Webinar - Unbox GitLab CI/CD
Webinar - Unbox GitLab CI/CD
Annie Huang
 
Android Platform Debugging and Development
Android Platform Debugging and DevelopmentAndroid Platform Debugging and Development
Android Platform Debugging and Development
Opersys inc.
 
CI/CD Pipeline mit Gitlab CI und Kubernetes
CI/CD Pipeline mit Gitlab CI und KubernetesCI/CD Pipeline mit Gitlab CI und Kubernetes
CI/CD Pipeline mit Gitlab CI und Kubernetes
inovex GmbH
 

Similar to Your application has died and that's OK (20)

Advanced Debugging Techniques
Advanced Debugging TechniquesAdvanced Debugging Techniques
Advanced Debugging Techniques
 
Android Platform Debugging and Development
Android Platform Debugging and DevelopmentAndroid Platform Debugging and Development
Android Platform Debugging and Development
 
Критика "библиотечного" подхода в разработке под Android. UA Mobile 2016.
Критика "библиотечного" подхода в разработке под Android. UA Mobile 2016.Критика "библиотечного" подхода в разработке под Android. UA Mobile 2016.
Критика "библиотечного" подхода в разработке под Android. UA Mobile 2016.
 
Creating a reasonable project boilerplate
Creating a reasonable project boilerplateCreating a reasonable project boilerplate
Creating a reasonable project boilerplate
 
Alexander Kutsan: “C++ compilation boost”
Alexander Kutsan: “C++ compilation boost” Alexander Kutsan: “C++ compilation boost”
Alexander Kutsan: “C++ compilation boost”
 
Software maintenance PyConUK 2016
Software maintenance PyConUK 2016Software maintenance PyConUK 2016
Software maintenance PyConUK 2016
 
Headless Android at AnDevCon3
Headless Android at AnDevCon3Headless Android at AnDevCon3
Headless Android at AnDevCon3
 
GeoServer Developers Workshop
GeoServer Developers WorkshopGeoServer Developers Workshop
GeoServer Developers Workshop
 
Why the yocto project for my io t project elc_edinburgh_2018
Why the yocto project for my io t project elc_edinburgh_2018Why the yocto project for my io t project elc_edinburgh_2018
Why the yocto project for my io t project elc_edinburgh_2018
 
StudioSL Presentation in Grenoble 2011
StudioSL Presentation in Grenoble 2011StudioSL Presentation in Grenoble 2011
StudioSL Presentation in Grenoble 2011
 
Real-World Docker: 10 Things We've Learned
Real-World Docker: 10 Things We've Learned  Real-World Docker: 10 Things We've Learned
Real-World Docker: 10 Things We've Learned
 
Introduction to react native @ TIC NUST
Introduction to react native @ TIC NUSTIntroduction to react native @ TIC NUST
Introduction to react native @ TIC NUST
 
Android Platform Debugging and Development
Android Platform Debugging and DevelopmentAndroid Platform Debugging and Development
Android Platform Debugging and Development
 
Troubleshooting .net core on linux
Troubleshooting .net core on linuxTroubleshooting .net core on linux
Troubleshooting .net core on linux
 
Android Platform Debugging and Development
Android Platform Debugging and DevelopmentAndroid Platform Debugging and Development
Android Platform Debugging and Development
 
Ext GWT - Overview and Implementation Case Study
Ext GWT - Overview and Implementation Case StudyExt GWT - Overview and Implementation Case Study
Ext GWT - Overview and Implementation Case Study
 
Perspectives on Docker
Perspectives on DockerPerspectives on Docker
Perspectives on Docker
 
Webinar - Unbox GitLab CI/CD
Webinar - Unbox GitLab CI/CD Webinar - Unbox GitLab CI/CD
Webinar - Unbox GitLab CI/CD
 
Android Platform Debugging and Development
Android Platform Debugging and DevelopmentAndroid Platform Debugging and Development
Android Platform Debugging and Development
 
CI/CD Pipeline mit Gitlab CI und Kubernetes
CI/CD Pipeline mit Gitlab CI und KubernetesCI/CD Pipeline mit Gitlab CI und Kubernetes
CI/CD Pipeline mit Gitlab CI und Kubernetes
 

Recently uploaded

Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
Mariano Tinti
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
kumardaparthi1024
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
Claudio Di Ciccio
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
Zilliz
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
Kumud Singh
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
Zilliz
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
tolgahangng
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
DianaGray10
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
名前 です男
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
SOFTTECHHUB
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
Daiki Mogmet Ito
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
Tomaz Bratanic
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
shyamraj55
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
Matthew Sinclair
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
danishmna97
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
Pixlogix Infotech
 

Recently uploaded (20)

Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
 

Your application has died and that's OK

  • 3. Hello! I am Anton Whalley Partner Technical Specialist Node.js Diagnostics Member – llnode maintainer IBM Strategic Open Source Committer Rust Dublin Co-Organiser Certified Kubernetes Administrator 3
  • 4. A (very) short history of crash analysis Magnetic Cores created Mid 1950s First Deployed by MIT/US Navy Commercialised IBM 770 4
  • 5. Modern Crash Analysis 5 Executable Running Process Terminated Process Load Abort Snapshot E.G gcore Core Dump *nix Mini Dump Windows Runtime Specific Reporting WinDbg gcc, lldb Custom Tool Files Joyee Cheung – Bring Javascript Back To Life https://www.youtube.com/watch?v=XQIo9knnb2s
  • 6. Why Crash Analysis ● Locked Down Production Environment ○ (Should be) Restricted Access ○ Debug tools not available ○ Heisenbugs ○ Unoptimized Builds ● Issues with logs ○ Only captures known/knowns ○ Requires adhoc updates ○ Stacktraces Don’t capture the full state 6 David Pacheco - https://dl.acm.org/doi/10.1145/2039359.2039361
  • 7. Failure Types 7 Implicit Explicit Type Error Uncaught Exception Segmentation Fault Panic Hardware Error Assertion Failure Process Abort Error Code Exit Incorrect Result Leaks Resources Stops Doing Work Pathological Fatal Non-fatal Error Message Returns Error Code Bryan Cantrill – Docker in Production https://www.youtube.com/watch?v=AdMqCUhvRz8
  • 9. The birth place of site reliability engineering? HMS Sailsbury 1747 First Controlled Experiment Identified a cure for scurvy 40yrs to be adopted In 1780 1457 admissions; in 1806 there were 2. 9
  • 10. The return of scurvy! HMS Alert 1875 85% of crew succumbed Lemons changed for limes - 1860s Misunderstood causes The rise of steam engines meant shorter trips 10
  • 11. How We Forget ● Improvements in Adjacent Technology ● Questionable Refinements in Approach ● Solutions are not Accessible ● Concepts with Unbalanced weighting ● The Generation Gap 11
  • 12. Crash Analysis in K8s ● Core Dump Handler ● Open Source ● Cloud Agnostic - xKS ● From 12 Separate organisations contribute 12 https://github.com/IBM/core-dump-handler/
  • 13. 13 On Demand Crash Analysis Augmented Pods with IDE Integrated Into Git workflow DX Infrastructure – gitpod.io https://venshare.com/blog/gitops-coredump/
  • 14. Future Work 14 - Automated Analysis - Remove Sensitive Data - Cloud Events
  • 15. 15 Thanks! For Listening You can find me at @dhight9 or @No9 GitHub https://openjsf.org/blog/

Editor's Notes

  1. Crash analysis is how we actually used to debug Extracting information from a malfunctioning application used to be how we debugged.
  2. Improved Ship Technology == Improved development, practices code debuggers Focus on Backtraces – lemons swapped for limes Lemons no longer accessible – make tooling easy to find and use Biological Causes == The concept of uptime lead to starved connections/DNS failures The generational gap – muscle memory of the organisaton fails
  3. https://github.com/IBM/core-dump-handler/