3. Hello!
I am Anton Whalley
Partner Technical Specialist
Node.js Diagnostics Member – llnode maintainer
IBM Strategic Open Source Committer
Rust Dublin Co-Organiser
Certified Kubernetes Administrator
3
4. A (very) short
history of crash
analysis
Magnetic Cores created Mid 1950s
First Deployed by MIT/US Navy
Commercialised IBM 770
4
5. Modern Crash Analysis
5
Executable
Running Process
Terminated Process
Load
Abort
Snapshot
E.G gcore
Core Dump
*nix
Mini Dump
Windows
Runtime Specific
Reporting
WinDbg
gcc, lldb
Custom Tool
Files
Joyee Cheung – Bring Javascript Back To Life https://www.youtube.com/watch?v=XQIo9knnb2s
6. Why Crash Analysis
● Locked Down Production Environment
○ (Should be) Restricted Access
○ Debug tools not available
○ Heisenbugs
○ Unoptimized Builds
● Issues with logs
○ Only captures known/knowns
○ Requires adhoc updates
○ Stacktraces Don’t capture the full state
6
David Pacheco - https://dl.acm.org/doi/10.1145/2039359.2039361
7. Failure Types
7
Implicit
Explicit
Type Error
Uncaught Exception
Segmentation Fault
Panic
Hardware Error
Assertion Failure
Process Abort
Error Code Exit
Incorrect Result
Leaks Resources
Stops Doing Work
Pathological
Fatal
Non-fatal
Error Message
Returns Error Code
Bryan Cantrill – Docker in Production https://www.youtube.com/watch?v=AdMqCUhvRz8
9. The birth place
of site reliability
engineering?
HMS Sailsbury 1747
First Controlled Experiment
Identified a cure for scurvy
40yrs to be adopted
In 1780 1457 admissions;
in 1806 there were 2.
9
10. The return of
scurvy!
HMS Alert 1875
85% of crew succumbed
Lemons changed for limes - 1860s
Misunderstood causes
The rise of steam engines meant
shorter trips
10
11. How We Forget
● Improvements in Adjacent Technology
● Questionable Refinements in Approach
● Solutions are not Accessible
● Concepts with Unbalanced weighting
● The Generation Gap
11
12. Crash Analysis in K8s
● Core Dump Handler
● Open Source
● Cloud Agnostic - xKS
● From 12 Separate
organisations contribute
12
https://github.com/IBM/core-dump-handler/
13. 13
On Demand Crash Analysis
Augmented Pods with IDE
Integrated Into Git workflow
DX Infrastructure – gitpod.io
https://venshare.com/blog/gitops-coredump/
Crash analysis is how we actually used to debug
Extracting information from a malfunctioning application used to be how we debugged.
Improved Ship Technology == Improved development, practices code debuggers
Focus on Backtraces – lemons swapped for limes
Lemons no longer accessible – make tooling easy to find and useBiological Causes == The concept of uptime lead to starved connections/DNS failures The generational gap – muscle memory of the organisaton fails