Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
Rust Dublin - Post Mortem Tools
1. Bring out your dead!
Using Rust to manage fatal process
crashes and provide post-mortem
analysis
Anton Whalley
t: @dhigit9
gh: @No9 https://pilgrimagemedievalireland.com/2016/10/30/irish-halloween-traditions/
4. Why Fatal Observability
@dhigit9
• In situ debugging is untenable in a production environment
Access
Tools
Heisenbugs
Unoptimized Builds
• Log based debugging
Only captures the known knowns
Requires risky ad-hoc updates
Stacktrace doesn’t capture full process state
Dave Pacheco ACM 2011 - https://cacm.acm.org/magazines/2011/12/142525-postmortem-debugging-in-dynamic-environments/fulltext
5. Prior Art
Joyent Thoth
https://github.com/joyent/manta-thoth
• Node.js source
• IllumOS based
• Requires Manta
• Advanced Search / Grouping / Ticket Capabilities
• On demand debugging
Fujitsu – Core Dump Node Detector
https://github.com/fenggw-fnst/coredump-node-detector
• C & bash source
• Kubernetes based
• No Core Dump Management
@dhigit9
6. Core Dump Handler Design Goals
• Simple as possible install and usage
One Line install & One Line CLI session
• Facilitate Debugging
cli to run debug sessions with preinstalled tools
• Only use established technology as a dependency
S3, kubectl, zip, lldb
• Target all flavours of Kubernetes but be possible to use at OS Level
• Clear configuration layout
Diagrams and documentation on deployed assets
@dhigit9
10. Future Work
• On Demand Dumps – Probably with gcore / runtime facilities
• Deeper xKS support - EKS Meta data, CoreOS
• Categorisation, Scrubbing, Alerts (Likely a separate project)
• More Documentation
• Wider Language Support (Go, Python?, .Net Core?)
• Better source code flow
• Signed Packages
@dhigit9
11. Thanks for watching!
Try it out on a FREE kubernetes cluster
https://www.ibm.com/cloud/free/kubernetes
PLS GIVE IT A STAR
@dhigit9
https://github.com/IBM/core-dump-handler/