SlideShare a Scribd company logo

The Hurricane's Butterfly: Debugging pathologically performing systems

B
bcantrill

Talk given as a Jane Street Tech Talk (https://www.janestreet.com/tech-talks/hurricanes-butterfly/); video to come.

1 of 27
Download to read offline
The Hurricane’s Butterfly
Debugging pathologically performing systems
CTO
bryan@joyent.com
Bryan Cantrill
@bcantrill
Debugging system failure
• Failures are easiest to debug when they are explicit and fatal
• A system that fails fatally stops: it ceases to make forward
progress, leaving behind a snapshot of its state — a core dump
• Unfortunately, these are not all problems…
• A broad class of problems are non-fatal: the system continues
to operate despite having failed, often destroying evidence
• Worst of all are those non-fatal failures that are also implicit
Implicit, non-fatal failure
• The most difficult, time-consuming bugs to debug are those in
which the system failure is unbeknownst to the system itself
• The system does the wrong thing or returns the wrong result or
has pathological side effects (e.g., resource leaks)
• Of these, the gnarliest class are those failures that are not
strictly speaking failure at all: the system is operating correctly,
but is failing to operate in a timely or efficient fashion
• That is, it just… sucks
The stack of abstraction
• Our software systems are built as stacks of abstraction
• These stacks allow us to stand on the shoulders of history — to
reuse components without rebuilding them
• We can do this because of the software paradox: software is
both information and machine, exhibiting properties of both
• Our stacks are higher and run deeper than we can see or know:
software is silent and opaque; the nature of abstraction is to
seal us from what runs beneath!
• They run so deep as to challenge our definition of software…
The Butterflies
• When the stack of abstraction performs pathologically, its power
transmogrifies to peril: layering amplifies performance
pathologies but hinders insight
• Work amplifies as we go down the stack
• Latency amplifies as we go up the stack
• Seemingly minor issues in one layer can cascade into systemic
pathological performance
• These are the butterflies that cause hurricanes
Butterfly I: ARC-induced black hole

Recommended

Visualizing Systems with Statemaps
Visualizing Systems with StatemapsVisualizing Systems with Statemaps
Visualizing Systems with Statemapsbcantrill
 
Debugging (Docker) containers in production
Debugging (Docker) containers in productionDebugging (Docker) containers in production
Debugging (Docker) containers in productionbcantrill
 
Debugging under fire: Keeping your head when systems have lost their mind
Debugging under fire: Keeping your head when systems have lost their mindDebugging under fire: Keeping your head when systems have lost their mind
Debugging under fire: Keeping your head when systems have lost their mindbcantrill
 
Leadership Without Management: Scaling Organizations by Scaling Engineers
Leadership Without Management: Scaling Organizations by Scaling EngineersLeadership Without Management: Scaling Organizations by Scaling Engineers
Leadership Without Management: Scaling Organizations by Scaling Engineersbcantrill
 
Zebras all the way down: The engineering challenges of the data path
Zebras all the way down: The engineering challenges of the data pathZebras all the way down: The engineering challenges of the data path
Zebras all the way down: The engineering challenges of the data pathbcantrill
 
Debugging microservices in production
Debugging microservices in productionDebugging microservices in production
Debugging microservices in productionbcantrill
 
Jax Devops 2017 Succeeding in the Cloud – the guidebook of Fail
Jax Devops 2017  Succeeding in the Cloud – the guidebook of FailJax Devops 2017  Succeeding in the Cloud – the guidebook of Fail
Jax Devops 2017 Succeeding in the Cloud – the guidebook of FailSteve Poole
 
Fault-tolerance on the Cheap: Making Systems That (Probably) Won't Fall Over
Fault-tolerance on the Cheap: Making Systems That (Probably) Won't Fall Over Fault-tolerance on the Cheap: Making Systems That (Probably) Won't Fall Over
Fault-tolerance on the Cheap: Making Systems That (Probably) Won't Fall Over Brian Troutwine
 

More Related Content

What's hot

Chaos Engineering Without Observability ... Is Just Chaos
Chaos Engineering Without Observability ... Is Just ChaosChaos Engineering Without Observability ... Is Just Chaos
Chaos Engineering Without Observability ... Is Just ChaosCharity Majors
 
Scaling a Web Site - OSCON Tutorial
Scaling a Web Site - OSCON TutorialScaling a Web Site - OSCON Tutorial
Scaling a Web Site - OSCON Tutorialduleepa
 
Thierry de Pauw - Feature Branching considered Evil - Codemotion Milan 2018
Thierry de Pauw - Feature Branching considered Evil - Codemotion Milan 2018Thierry de Pauw - Feature Branching considered Evil - Codemotion Milan 2018
Thierry de Pauw - Feature Branching considered Evil - Codemotion Milan 2018Codemotion
 
Creating An Incremental Architecture For Your System
Creating An Incremental Architecture For Your SystemCreating An Incremental Architecture For Your System
Creating An Incremental Architecture For Your SystemGiovanni Asproni
 
DevOps Roadtrip Final Speaking Deck
DevOps Roadtrip Final Speaking Deck DevOps Roadtrip Final Speaking Deck
DevOps Roadtrip Final Speaking Deck VictorOps
 
Support at scale in a DevOps world How Swarming and Cynefin can save you from...
Support at scale in a DevOps world How Swarming and Cynefin can save you from...Support at scale in a DevOps world How Swarming and Cynefin can save you from...
Support at scale in a DevOps world How Swarming and Cynefin can save you from...Jon Stevens-Hall
 
DevOps: A Practical Guide
DevOps: A Practical GuideDevOps: A Practical Guide
DevOps: A Practical GuideVictorOps
 
CS101- Introduction to Computing- Lecture 45
CS101- Introduction to Computing- Lecture 45CS101- Introduction to Computing- Lecture 45
CS101- Introduction to Computing- Lecture 45Bilal Ahmed
 
Corporate Open Source Anti-patterns
Corporate Open Source Anti-patternsCorporate Open Source Anti-patterns
Corporate Open Source Anti-patternsbcantrill
 
DSC UTeM DevOps Session#1: Intro to DevOps Presentation Slides
DSC UTeM DevOps Session#1: Intro to DevOps Presentation SlidesDSC UTeM DevOps Session#1: Intro to DevOps Presentation Slides
DSC UTeM DevOps Session#1: Intro to DevOps Presentation SlidesDSC UTeM
 
Being Elastic -- Evolving Programming for the Cloud
Being Elastic -- Evolving Programming for the CloudBeing Elastic -- Evolving Programming for the Cloud
Being Elastic -- Evolving Programming for the CloudRandy Shoup
 
Using Machine Learning to Optimize DevOps Practices
Using Machine Learning to Optimize DevOps PracticesUsing Machine Learning to Optimize DevOps Practices
Using Machine Learning to Optimize DevOps PracticesPeter Varhol
 
Internet of Things, TYBSC IT, Semester 5, Unit II
Internet of Things, TYBSC IT, Semester 5, Unit IIInternet of Things, TYBSC IT, Semester 5, Unit II
Internet of Things, TYBSC IT, Semester 5, Unit IIArti Parab Academics
 
MeasureWorks - Velocity Conference Europe 2012 - a Web Performance dashboard ...
MeasureWorks - Velocity Conference Europe 2012 - a Web Performance dashboard ...MeasureWorks - Velocity Conference Europe 2012 - a Web Performance dashboard ...
MeasureWorks - Velocity Conference Europe 2012 - a Web Performance dashboard ...MeasureWorks
 
How good is your software development team ?
How good is your software development team ?How good is your software development team ?
How good is your software development team ?Kinshuk Adhikary
 
#ATAGTR2021 Presentation : "Chaos engineering: Break it to make it" by Anupa...
#ATAGTR2021 Presentation :  "Chaos engineering: Break it to make it" by Anupa...#ATAGTR2021 Presentation :  "Chaos engineering: Break it to make it" by Anupa...
#ATAGTR2021 Presentation : "Chaos engineering: Break it to make it" by Anupa...Agile Testing Alliance
 
Building a Successful Organization By Mastering Failure
Building a Successful Organization By Mastering FailureBuilding a Successful Organization By Mastering Failure
Building a Successful Organization By Mastering Failurejgoulah
 

What's hot (20)

Chaos Engineering Without Observability ... Is Just Chaos
Chaos Engineering Without Observability ... Is Just ChaosChaos Engineering Without Observability ... Is Just Chaos
Chaos Engineering Without Observability ... Is Just Chaos
 
Scaling a Web Site - OSCON Tutorial
Scaling a Web Site - OSCON TutorialScaling a Web Site - OSCON Tutorial
Scaling a Web Site - OSCON Tutorial
 
Thierry de Pauw - Feature Branching considered Evil - Codemotion Milan 2018
Thierry de Pauw - Feature Branching considered Evil - Codemotion Milan 2018Thierry de Pauw - Feature Branching considered Evil - Codemotion Milan 2018
Thierry de Pauw - Feature Branching considered Evil - Codemotion Milan 2018
 
Creating An Incremental Architecture For Your System
Creating An Incremental Architecture For Your SystemCreating An Incremental Architecture For Your System
Creating An Incremental Architecture For Your System
 
DevOps Roadtrip Final Speaking Deck
DevOps Roadtrip Final Speaking Deck DevOps Roadtrip Final Speaking Deck
DevOps Roadtrip Final Speaking Deck
 
Support at scale in a DevOps world How Swarming and Cynefin can save you from...
Support at scale in a DevOps world How Swarming and Cynefin can save you from...Support at scale in a DevOps world How Swarming and Cynefin can save you from...
Support at scale in a DevOps world How Swarming and Cynefin can save you from...
 
DevOps: A Practical Guide
DevOps: A Practical GuideDevOps: A Practical Guide
DevOps: A Practical Guide
 
CS101- Introduction to Computing- Lecture 45
CS101- Introduction to Computing- Lecture 45CS101- Introduction to Computing- Lecture 45
CS101- Introduction to Computing- Lecture 45
 
Corporate Open Source Anti-patterns
Corporate Open Source Anti-patternsCorporate Open Source Anti-patterns
Corporate Open Source Anti-patterns
 
DSC UTeM DevOps Session#1: Intro to DevOps Presentation Slides
DSC UTeM DevOps Session#1: Intro to DevOps Presentation SlidesDSC UTeM DevOps Session#1: Intro to DevOps Presentation Slides
DSC UTeM DevOps Session#1: Intro to DevOps Presentation Slides
 
Twelve tips on how to prepare an ERC grant proposal
Twelve tips on how to prepare an ERC grant proposalTwelve tips on how to prepare an ERC grant proposal
Twelve tips on how to prepare an ERC grant proposal
 
Being Elastic -- Evolving Programming for the Cloud
Being Elastic -- Evolving Programming for the CloudBeing Elastic -- Evolving Programming for the Cloud
Being Elastic -- Evolving Programming for the Cloud
 
Plugin style EA
Plugin style EAPlugin style EA
Plugin style EA
 
Using Machine Learning to Optimize DevOps Practices
Using Machine Learning to Optimize DevOps PracticesUsing Machine Learning to Optimize DevOps Practices
Using Machine Learning to Optimize DevOps Practices
 
Internet of Things, TYBSC IT, Semester 5, Unit II
Internet of Things, TYBSC IT, Semester 5, Unit IIInternet of Things, TYBSC IT, Semester 5, Unit II
Internet of Things, TYBSC IT, Semester 5, Unit II
 
MeasureWorks - Velocity Conference Europe 2012 - a Web Performance dashboard ...
MeasureWorks - Velocity Conference Europe 2012 - a Web Performance dashboard ...MeasureWorks - Velocity Conference Europe 2012 - a Web Performance dashboard ...
MeasureWorks - Velocity Conference Europe 2012 - a Web Performance dashboard ...
 
How good is your software development team ?
How good is your software development team ?How good is your software development team ?
How good is your software development team ?
 
#ATAGTR2021 Presentation : "Chaos engineering: Break it to make it" by Anupa...
#ATAGTR2021 Presentation :  "Chaos engineering: Break it to make it" by Anupa...#ATAGTR2021 Presentation :  "Chaos engineering: Break it to make it" by Anupa...
#ATAGTR2021 Presentation : "Chaos engineering: Break it to make it" by Anupa...
 
Building a Successful Organization By Mastering Failure
Building a Successful Organization By Mastering FailureBuilding a Successful Organization By Mastering Failure
Building a Successful Organization By Mastering Failure
 
Architects and design-org
Architects and design-orgArchitects and design-org
Architects and design-org
 

Similar to The Hurricane's Butterfly: Debugging pathologically performing systems

Automatic Assessment of Failure Recovery in Erlang Applications
Automatic Assessment of Failure Recovery in Erlang ApplicationsAutomatic Assessment of Failure Recovery in Erlang Applications
Automatic Assessment of Failure Recovery in Erlang ApplicationsJan Henry Nystrom
 
Five Things I Learned While Building Anomaly Detection Tools - Toufic Boubez ...
Five Things I Learned While Building Anomaly Detection Tools - Toufic Boubez ...Five Things I Learned While Building Anomaly Detection Tools - Toufic Boubez ...
Five Things I Learned While Building Anomaly Detection Tools - Toufic Boubez ...tboubez
 
DevOps Days Vancouver 2014 Slides
DevOps Days Vancouver 2014 SlidesDevOps Days Vancouver 2014 Slides
DevOps Days Vancouver 2014 SlidesAlex Cruise
 
Dependable Systems -Fault Tolerance Patterns (4/16)
Dependable Systems -Fault Tolerance Patterns (4/16)Dependable Systems -Fault Tolerance Patterns (4/16)
Dependable Systems -Fault Tolerance Patterns (4/16)Peter Tröger
 
The Cost of Complexity
The Cost of ComplexityThe Cost of Complexity
The Cost of ComplexityAaron Bedra
 
Data centre analytics toufic boubez-metafor-dev ops days vancouver-2013-10-25
Data centre analytics toufic boubez-metafor-dev ops days vancouver-2013-10-25Data centre analytics toufic boubez-metafor-dev ops days vancouver-2013-10-25
Data centre analytics toufic boubez-metafor-dev ops days vancouver-2013-10-25tboubez
 
Fault tolerance review by tsegabrehan zerihun
Fault tolerance review by tsegabrehan zerihunFault tolerance review by tsegabrehan zerihun
Fault tolerance review by tsegabrehan zerihunTsegabrehan Am
 
Expert system (unit 1 & 2)
Expert system (unit 1 & 2)Expert system (unit 1 & 2)
Expert system (unit 1 & 2)Lakshya Gupta
 
Kanban - A Crash Course
Kanban - A Crash CourseKanban - A Crash Course
Kanban - A Crash CourseSam McAfee
 
MongoDB World 2018: Tutorial - MongoDB Meets Chaos Monkey
MongoDB World 2018: Tutorial - MongoDB Meets Chaos MonkeyMongoDB World 2018: Tutorial - MongoDB Meets Chaos Monkey
MongoDB World 2018: Tutorial - MongoDB Meets Chaos MonkeyMongoDB
 
Tools and practices to use in a Continuous Delivery pipeline
Tools and practices to use in a Continuous Delivery pipelineTools and practices to use in a Continuous Delivery pipeline
Tools and practices to use in a Continuous Delivery pipelineMatteo Emili
 
DockerCon SF 2019 - TDD is Dead
DockerCon SF 2019 - TDD is DeadDockerCon SF 2019 - TDD is Dead
DockerCon SF 2019 - TDD is DeadKevin Crawley
 
Velocity Europe 2013: Beyond Pretty Charts: Analytics for the cloud infrastru...
Velocity Europe 2013: Beyond Pretty Charts: Analytics for the cloud infrastru...Velocity Europe 2013: Beyond Pretty Charts: Analytics for the cloud infrastru...
Velocity Europe 2013: Beyond Pretty Charts: Analytics for the cloud infrastru...tboubez
 
Bugs as Deviant Behavior: A General Approach to Inferring Errors in Systems Code
Bugs as Deviant Behavior: A General Approach to Inferring Errors in Systems CodeBugs as Deviant Behavior: A General Approach to Inferring Errors in Systems Code
Bugs as Deviant Behavior: A General Approach to Inferring Errors in Systems CodeMiro Cupak
 
Beyond pretty charts, Analytics for the rest of us. Toufic Boubez DevOps Days...
Beyond pretty charts, Analytics for the rest of us. Toufic Boubez DevOps Days...Beyond pretty charts, Analytics for the rest of us. Toufic Boubez DevOps Days...
Beyond pretty charts, Analytics for the rest of us. Toufic Boubez DevOps Days...tboubez
 
Lecture 06 production system
Lecture 06 production systemLecture 06 production system
Lecture 06 production systemHema Kashyap
 
Monitoring Complex Systems - Chicago Erlang, 2014
Monitoring Complex Systems - Chicago Erlang, 2014Monitoring Complex Systems - Chicago Erlang, 2014
Monitoring Complex Systems - Chicago Erlang, 2014Brian Troutwine
 

Similar to The Hurricane's Butterfly: Debugging pathologically performing systems (20)

Automatic Assessment of Failure Recovery in Erlang Applications
Automatic Assessment of Failure Recovery in Erlang ApplicationsAutomatic Assessment of Failure Recovery in Erlang Applications
Automatic Assessment of Failure Recovery in Erlang Applications
 
Five Things I Learned While Building Anomaly Detection Tools - Toufic Boubez ...
Five Things I Learned While Building Anomaly Detection Tools - Toufic Boubez ...Five Things I Learned While Building Anomaly Detection Tools - Toufic Boubez ...
Five Things I Learned While Building Anomaly Detection Tools - Toufic Boubez ...
 
DevOps Days Vancouver 2014 Slides
DevOps Days Vancouver 2014 SlidesDevOps Days Vancouver 2014 Slides
DevOps Days Vancouver 2014 Slides
 
Dependable Systems -Fault Tolerance Patterns (4/16)
Dependable Systems -Fault Tolerance Patterns (4/16)Dependable Systems -Fault Tolerance Patterns (4/16)
Dependable Systems -Fault Tolerance Patterns (4/16)
 
Production based system
Production based systemProduction based system
Production based system
 
Chaos engineering
Chaos engineering Chaos engineering
Chaos engineering
 
The Cost of Complexity
The Cost of ComplexityThe Cost of Complexity
The Cost of Complexity
 
Data centre analytics toufic boubez-metafor-dev ops days vancouver-2013-10-25
Data centre analytics toufic boubez-metafor-dev ops days vancouver-2013-10-25Data centre analytics toufic boubez-metafor-dev ops days vancouver-2013-10-25
Data centre analytics toufic boubez-metafor-dev ops days vancouver-2013-10-25
 
Fault tolerance review by tsegabrehan zerihun
Fault tolerance review by tsegabrehan zerihunFault tolerance review by tsegabrehan zerihun
Fault tolerance review by tsegabrehan zerihun
 
Expert system (unit 1 & 2)
Expert system (unit 1 & 2)Expert system (unit 1 & 2)
Expert system (unit 1 & 2)
 
Kanban - A Crash Course
Kanban - A Crash CourseKanban - A Crash Course
Kanban - A Crash Course
 
MongoDB World 2018: Tutorial - MongoDB Meets Chaos Monkey
MongoDB World 2018: Tutorial - MongoDB Meets Chaos MonkeyMongoDB World 2018: Tutorial - MongoDB Meets Chaos Monkey
MongoDB World 2018: Tutorial - MongoDB Meets Chaos Monkey
 
Tools and practices to use in a Continuous Delivery pipeline
Tools and practices to use in a Continuous Delivery pipelineTools and practices to use in a Continuous Delivery pipeline
Tools and practices to use in a Continuous Delivery pipeline
 
DockerCon SF 2019 - TDD is Dead
DockerCon SF 2019 - TDD is DeadDockerCon SF 2019 - TDD is Dead
DockerCon SF 2019 - TDD is Dead
 
Velocity Europe 2013: Beyond Pretty Charts: Analytics for the cloud infrastru...
Velocity Europe 2013: Beyond Pretty Charts: Analytics for the cloud infrastru...Velocity Europe 2013: Beyond Pretty Charts: Analytics for the cloud infrastru...
Velocity Europe 2013: Beyond Pretty Charts: Analytics for the cloud infrastru...
 
Debugging distributed systems
Debugging distributed systemsDebugging distributed systems
Debugging distributed systems
 
Bugs as Deviant Behavior: A General Approach to Inferring Errors in Systems Code
Bugs as Deviant Behavior: A General Approach to Inferring Errors in Systems CodeBugs as Deviant Behavior: A General Approach to Inferring Errors in Systems Code
Bugs as Deviant Behavior: A General Approach to Inferring Errors in Systems Code
 
Beyond pretty charts, Analytics for the rest of us. Toufic Boubez DevOps Days...
Beyond pretty charts, Analytics for the rest of us. Toufic Boubez DevOps Days...Beyond pretty charts, Analytics for the rest of us. Toufic Boubez DevOps Days...
Beyond pretty charts, Analytics for the rest of us. Toufic Boubez DevOps Days...
 
Lecture 06 production system
Lecture 06 production systemLecture 06 production system
Lecture 06 production system
 
Monitoring Complex Systems - Chicago Erlang, 2014
Monitoring Complex Systems - Chicago Erlang, 2014Monitoring Complex Systems - Chicago Erlang, 2014
Monitoring Complex Systems - Chicago Erlang, 2014
 

More from bcantrill

Predicting the Present
Predicting the PresentPredicting the Present
Predicting the Presentbcantrill
 
Sharpening the Axe: The Primacy of Toolmaking
Sharpening the Axe: The Primacy of ToolmakingSharpening the Axe: The Primacy of Toolmaking
Sharpening the Axe: The Primacy of Toolmakingbcantrill
 
Coming of Age: Developing young technologists without robbing them of their y...
Coming of Age: Developing young technologists without robbing them of their y...Coming of Age: Developing young technologists without robbing them of their y...
Coming of Age: Developing young technologists without robbing them of their y...bcantrill
 
I have come to bury the BIOS, not to open it: The need for holistic systems
I have come to bury the BIOS, not to open it: The need for holistic systemsI have come to bury the BIOS, not to open it: The need for holistic systems
I have come to bury the BIOS, not to open it: The need for holistic systemsbcantrill
 
Towards Holistic Systems
Towards Holistic SystemsTowards Holistic Systems
Towards Holistic Systemsbcantrill
 
The Coming Firmware Revolution
The Coming Firmware RevolutionThe Coming Firmware Revolution
The Coming Firmware Revolutionbcantrill
 
Hardware/software Co-design: The Coming Golden Age
Hardware/software Co-design: The Coming Golden AgeHardware/software Co-design: The Coming Golden Age
Hardware/software Co-design: The Coming Golden Agebcantrill
 
Tockilator: Deducing Tock execution flows from Ibex Verilator traces
Tockilator: Deducing Tock execution flows from Ibex Verilator tracesTockilator: Deducing Tock execution flows from Ibex Verilator traces
Tockilator: Deducing Tock execution flows from Ibex Verilator tracesbcantrill
 
No Moore Left to Give: Enterprise Computing After Moore's Law
No Moore Left to Give: Enterprise Computing After Moore's LawNo Moore Left to Give: Enterprise Computing After Moore's Law
No Moore Left to Give: Enterprise Computing After Moore's Lawbcantrill
 
Andreessen's Corollary: Ethical Dilemmas in Software Engineering
Andreessen's Corollary: Ethical Dilemmas in Software EngineeringAndreessen's Corollary: Ethical Dilemmas in Software Engineering
Andreessen's Corollary: Ethical Dilemmas in Software Engineeringbcantrill
 
Platform values, Rust, and the implications for system software
Platform values, Rust, and the implications for system softwarePlatform values, Rust, and the implications for system software
Platform values, Rust, and the implications for system softwarebcantrill
 
Is it time to rewrite the operating system in Rust?
Is it time to rewrite the operating system in Rust?Is it time to rewrite the operating system in Rust?
Is it time to rewrite the operating system in Rust?bcantrill
 
dtrace.conf(16): DTrace state of the union
dtrace.conf(16): DTrace state of the uniondtrace.conf(16): DTrace state of the union
dtrace.conf(16): DTrace state of the unionbcantrill
 
Papers We Love: ARC after dark
Papers We Love: ARC after darkPapers We Love: ARC after dark
Papers We Love: ARC after darkbcantrill
 
Principles of Technology Leadership
Principles of Technology LeadershipPrinciples of Technology Leadership
Principles of Technology Leadershipbcantrill
 
Platform as reflection of values: Joyent, node.js, and beyond
Platform as reflection of values: Joyent, node.js, and beyondPlatform as reflection of values: Joyent, node.js, and beyond
Platform as reflection of values: Joyent, node.js, and beyondbcantrill
 
Down Memory Lane: Two Decades with the Slab Allocator
Down Memory Lane: Two Decades with the Slab AllocatorDown Memory Lane: Two Decades with the Slab Allocator
Down Memory Lane: Two Decades with the Slab Allocatorbcantrill
 
The State of Cloud 2016: The whirlwind of creative destruction
The State of Cloud 2016: The whirlwind of creative destructionThe State of Cloud 2016: The whirlwind of creative destruction
The State of Cloud 2016: The whirlwind of creative destructionbcantrill
 
Oral tradition in software engineering: Passing the craft across generations
Oral tradition in software engineering: Passing the craft across generationsOral tradition in software engineering: Passing the craft across generations
Oral tradition in software engineering: Passing the craft across generationsbcantrill
 
The Container Revolution: Reflections after the first decade
The Container Revolution: Reflections after the first decadeThe Container Revolution: Reflections after the first decade
The Container Revolution: Reflections after the first decadebcantrill
 

More from bcantrill (20)

Predicting the Present
Predicting the PresentPredicting the Present
Predicting the Present
 
Sharpening the Axe: The Primacy of Toolmaking
Sharpening the Axe: The Primacy of ToolmakingSharpening the Axe: The Primacy of Toolmaking
Sharpening the Axe: The Primacy of Toolmaking
 
Coming of Age: Developing young technologists without robbing them of their y...
Coming of Age: Developing young technologists without robbing them of their y...Coming of Age: Developing young technologists without robbing them of their y...
Coming of Age: Developing young technologists without robbing them of their y...
 
I have come to bury the BIOS, not to open it: The need for holistic systems
I have come to bury the BIOS, not to open it: The need for holistic systemsI have come to bury the BIOS, not to open it: The need for holistic systems
I have come to bury the BIOS, not to open it: The need for holistic systems
 
Towards Holistic Systems
Towards Holistic SystemsTowards Holistic Systems
Towards Holistic Systems
 
The Coming Firmware Revolution
The Coming Firmware RevolutionThe Coming Firmware Revolution
The Coming Firmware Revolution
 
Hardware/software Co-design: The Coming Golden Age
Hardware/software Co-design: The Coming Golden AgeHardware/software Co-design: The Coming Golden Age
Hardware/software Co-design: The Coming Golden Age
 
Tockilator: Deducing Tock execution flows from Ibex Verilator traces
Tockilator: Deducing Tock execution flows from Ibex Verilator tracesTockilator: Deducing Tock execution flows from Ibex Verilator traces
Tockilator: Deducing Tock execution flows from Ibex Verilator traces
 
No Moore Left to Give: Enterprise Computing After Moore's Law
No Moore Left to Give: Enterprise Computing After Moore's LawNo Moore Left to Give: Enterprise Computing After Moore's Law
No Moore Left to Give: Enterprise Computing After Moore's Law
 
Andreessen's Corollary: Ethical Dilemmas in Software Engineering
Andreessen's Corollary: Ethical Dilemmas in Software EngineeringAndreessen's Corollary: Ethical Dilemmas in Software Engineering
Andreessen's Corollary: Ethical Dilemmas in Software Engineering
 
Platform values, Rust, and the implications for system software
Platform values, Rust, and the implications for system softwarePlatform values, Rust, and the implications for system software
Platform values, Rust, and the implications for system software
 
Is it time to rewrite the operating system in Rust?
Is it time to rewrite the operating system in Rust?Is it time to rewrite the operating system in Rust?
Is it time to rewrite the operating system in Rust?
 
dtrace.conf(16): DTrace state of the union
dtrace.conf(16): DTrace state of the uniondtrace.conf(16): DTrace state of the union
dtrace.conf(16): DTrace state of the union
 
Papers We Love: ARC after dark
Papers We Love: ARC after darkPapers We Love: ARC after dark
Papers We Love: ARC after dark
 
Principles of Technology Leadership
Principles of Technology LeadershipPrinciples of Technology Leadership
Principles of Technology Leadership
 
Platform as reflection of values: Joyent, node.js, and beyond
Platform as reflection of values: Joyent, node.js, and beyondPlatform as reflection of values: Joyent, node.js, and beyond
Platform as reflection of values: Joyent, node.js, and beyond
 
Down Memory Lane: Two Decades with the Slab Allocator
Down Memory Lane: Two Decades with the Slab AllocatorDown Memory Lane: Two Decades with the Slab Allocator
Down Memory Lane: Two Decades with the Slab Allocator
 
The State of Cloud 2016: The whirlwind of creative destruction
The State of Cloud 2016: The whirlwind of creative destructionThe State of Cloud 2016: The whirlwind of creative destruction
The State of Cloud 2016: The whirlwind of creative destruction
 
Oral tradition in software engineering: Passing the craft across generations
Oral tradition in software engineering: Passing the craft across generationsOral tradition in software engineering: Passing the craft across generations
Oral tradition in software engineering: Passing the craft across generations
 
The Container Revolution: Reflections after the first decade
The Container Revolution: Reflections after the first decadeThe Container Revolution: Reflections after the first decade
The Container Revolution: Reflections after the first decade
 

Recently uploaded

killing camp 주차장 나누기-2 topology sort.pdf
killing camp 주차장 나누기-2 topology sort.pdfkilling camp 주차장 나누기-2 topology sort.pdf
killing camp 주차장 나누기-2 topology sort.pdfssuser82c38d
 
An Introduction to Globus for Researchers
An Introduction to Globus for ResearchersAn Introduction to Globus for Researchers
An Introduction to Globus for ResearchersGlobus
 
Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...
Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...
Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...Alluxio, Inc.
 
Reliable, Remote Computation at All Scales
Reliable, Remote Computation at All ScalesReliable, Remote Computation at All Scales
Reliable, Remote Computation at All ScalesGlobus
 
انتزاع و هزینه - انتزاع و تاثیرات آن در توسعه و نگهداری نرم‌افزار
انتزاع و هزینه - انتزاع و تاثیرات آن در توسعه و نگهداری نرم‌افزارانتزاع و هزینه - انتزاع و تاثیرات آن در توسعه و نگهداری نرم‌افزار
انتزاع و هزینه - انتزاع و تاثیرات آن در توسعه و نگهداری نرم‌افزارsohilww
 
Workshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con FlinkWorkshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con Flinkconfluent
 
CSS Notes in PDF, Easy to understand. For beginner to advanced. ...
CSS Notes in PDF, Easy to understand. For beginner to advanced.              ...CSS Notes in PDF, Easy to understand. For beginner to advanced.              ...
CSS Notes in PDF, Easy to understand. For beginner to advanced. ...syedfaisal759877
 
How AI is preventing account fraud at web scale
How AI is preventing account fraud at web scaleHow AI is preventing account fraud at web scale
How AI is preventing account fraud at web scaleAmir Moghimi
 
Role of DevOps in SaaS product Development.pdf.pptx
Role of DevOps in SaaS product Development.pdf.pptxRole of DevOps in SaaS product Development.pdf.pptx
Role of DevOps in SaaS product Development.pdf.pptxMindInventory
 
Joseph Yoder : Being Agile about Architecture
Joseph Yoder : Being Agile about ArchitectureJoseph Yoder : Being Agile about Architecture
Joseph Yoder : Being Agile about ArchitectureHironori Washizaki
 
Instrument Data Automation: The Life of a Flow
Instrument Data Automation: The Life of a FlowInstrument Data Automation: The Life of a Flow
Instrument Data Automation: The Life of a FlowGlobus
 
Advanced Globus System Administration Topics
Advanced Globus System Administration TopicsAdvanced Globus System Administration Topics
Advanced Globus System Administration TopicsGlobus
 
Agile & Scrum, Certified Scrum Master! Crash Course
Agile & Scrum,  Certified Scrum Master! Crash CourseAgile & Scrum,  Certified Scrum Master! Crash Course
Agile & Scrum, Certified Scrum Master! Crash CourseRohan Chandane
 
Building Research Applications with Globus PaaS
Building Research Applications with Globus PaaSBuilding Research Applications with Globus PaaS
Building Research Applications with Globus PaaSGlobus
 
Best Practices for Data Sharing Using Globus
Best Practices for Data Sharing Using GlobusBest Practices for Data Sharing Using Globus
Best Practices for Data Sharing Using GlobusGlobus
 
Globus for System Administrators
Globus for System AdministratorsGlobus for System Administrators
Globus for System AdministratorsGlobus
 
Design pattern talk by Kaya Weers - 2024
Design pattern talk by Kaya Weers - 2024Design pattern talk by Kaya Weers - 2024
Design pattern talk by Kaya Weers - 2024Kaya Weers
 
Passbolt Introduction and Usage for secret managment
Passbolt Introduction and Usage for secret managmentPassbolt Introduction and Usage for secret managment
Passbolt Introduction and Usage for secret managmentThierry Gayet
 
Managing multicast/igmp stream on Docker
Managing multicast/igmp stream on DockerManaging multicast/igmp stream on Docker
Managing multicast/igmp stream on DockerThierry Gayet
 

Recently uploaded (20)

killing camp 주차장 나누기-2 topology sort.pdf
killing camp 주차장 나누기-2 topology sort.pdfkilling camp 주차장 나누기-2 topology sort.pdf
killing camp 주차장 나누기-2 topology sort.pdf
 
An Introduction to Globus for Researchers
An Introduction to Globus for ResearchersAn Introduction to Globus for Researchers
An Introduction to Globus for Researchers
 
Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...
Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...
Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...
 
Reliable, Remote Computation at All Scales
Reliable, Remote Computation at All ScalesReliable, Remote Computation at All Scales
Reliable, Remote Computation at All Scales
 
انتزاع و هزینه - انتزاع و تاثیرات آن در توسعه و نگهداری نرم‌افزار
انتزاع و هزینه - انتزاع و تاثیرات آن در توسعه و نگهداری نرم‌افزارانتزاع و هزینه - انتزاع و تاثیرات آن در توسعه و نگهداری نرم‌افزار
انتزاع و هزینه - انتزاع و تاثیرات آن در توسعه و نگهداری نرم‌افزار
 
Workshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con FlinkWorkshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con Flink
 
CSS Notes in PDF, Easy to understand. For beginner to advanced. ...
CSS Notes in PDF, Easy to understand. For beginner to advanced.              ...CSS Notes in PDF, Easy to understand. For beginner to advanced.              ...
CSS Notes in PDF, Easy to understand. For beginner to advanced. ...
 
How AI is preventing account fraud at web scale
How AI is preventing account fraud at web scaleHow AI is preventing account fraud at web scale
How AI is preventing account fraud at web scale
 
Role of DevOps in SaaS product Development.pdf.pptx
Role of DevOps in SaaS product Development.pdf.pptxRole of DevOps in SaaS product Development.pdf.pptx
Role of DevOps in SaaS product Development.pdf.pptx
 
Joseph Yoder : Being Agile about Architecture
Joseph Yoder : Being Agile about ArchitectureJoseph Yoder : Being Agile about Architecture
Joseph Yoder : Being Agile about Architecture
 
Instrument Data Automation: The Life of a Flow
Instrument Data Automation: The Life of a FlowInstrument Data Automation: The Life of a Flow
Instrument Data Automation: The Life of a Flow
 
Advanced Globus System Administration Topics
Advanced Globus System Administration TopicsAdvanced Globus System Administration Topics
Advanced Globus System Administration Topics
 
Agile & Scrum, Certified Scrum Master! Crash Course
Agile & Scrum,  Certified Scrum Master! Crash CourseAgile & Scrum,  Certified Scrum Master! Crash Course
Agile & Scrum, Certified Scrum Master! Crash Course
 
2024 Trends Transforming Enterprise Resource Planning
2024 Trends Transforming Enterprise Resource Planning2024 Trends Transforming Enterprise Resource Planning
2024 Trends Transforming Enterprise Resource Planning
 
Building Research Applications with Globus PaaS
Building Research Applications with Globus PaaSBuilding Research Applications with Globus PaaS
Building Research Applications with Globus PaaS
 
Best Practices for Data Sharing Using Globus
Best Practices for Data Sharing Using GlobusBest Practices for Data Sharing Using Globus
Best Practices for Data Sharing Using Globus
 
Globus for System Administrators
Globus for System AdministratorsGlobus for System Administrators
Globus for System Administrators
 
Design pattern talk by Kaya Weers - 2024
Design pattern talk by Kaya Weers - 2024Design pattern talk by Kaya Weers - 2024
Design pattern talk by Kaya Weers - 2024
 
Passbolt Introduction and Usage for secret managment
Passbolt Introduction and Usage for secret managmentPassbolt Introduction and Usage for secret managment
Passbolt Introduction and Usage for secret managment
 
Managing multicast/igmp stream on Docker
Managing multicast/igmp stream on DockerManaging multicast/igmp stream on Docker
Managing multicast/igmp stream on Docker
 

The Hurricane's Butterfly: Debugging pathologically performing systems

  • 1. The Hurricane’s Butterfly Debugging pathologically performing systems CTO bryan@joyent.com Bryan Cantrill @bcantrill
  • 2. Debugging system failure • Failures are easiest to debug when they are explicit and fatal • A system that fails fatally stops: it ceases to make forward progress, leaving behind a snapshot of its state — a core dump • Unfortunately, these are not all problems… • A broad class of problems are non-fatal: the system continues to operate despite having failed, often destroying evidence • Worst of all are those non-fatal failures that are also implicit
  • 3. Implicit, non-fatal failure • The most difficult, time-consuming bugs to debug are those in which the system failure is unbeknownst to the system itself • The system does the wrong thing or returns the wrong result or has pathological side effects (e.g., resource leaks) • Of these, the gnarliest class are those failures that are not strictly speaking failure at all: the system is operating correctly, but is failing to operate in a timely or efficient fashion • That is, it just… sucks
  • 4. The stack of abstraction • Our software systems are built as stacks of abstraction • These stacks allow us to stand on the shoulders of history — to reuse components without rebuilding them • We can do this because of the software paradox: software is both information and machine, exhibiting properties of both • Our stacks are higher and run deeper than we can see or know: software is silent and opaque; the nature of abstraction is to seal us from what runs beneath! • They run so deep as to challenge our definition of software…
  • 5. The Butterflies • When the stack of abstraction performs pathologically, its power transmogrifies to peril: layering amplifies performance pathologies but hinders insight • Work amplifies as we go down the stack • Latency amplifies as we go up the stack • Seemingly minor issues in one layer can cascade into systemic pathological performance • These are the butterflies that cause hurricanes
  • 7. Butterfly II: Disk reader starvation
  • 8. Butterfly III: Kernel page-table isolation Data courtesy Scaleway, running a PHP workload with KPTI patches for Linux. Thank you Edouard Bonlieu and team!
  • 9. The Hurricane • With pathologically performing systems, we are faced with Leventhal’s Conundrum: given a hurricane, find the butterflies! • This is excruciatingly difficult: • Symptoms are often far removed from root cause • There may not be a single root cause but several • The system is dynamic and may change without warning • Improvements to the system are hard to model and verify • Emphatically, this is not “tuning” — it is debugging
  • 10. Performance debugging • When we think of it as debugging, we can stop pretending that understanding (and rectifying) pathological system performance is rote or mechanical — or easy • We can resist the temptation to be guided by folklore: just because someone heard about something causing a problem once doesn’t mean it’s the problem now! • We can resist the temptation to change the system before understanding it: just as you wouldn’t (or shouldn’t!) debug by just changing code, you shouldn’t debug a pathologically performing system by randomly altering it!
  • 11. How do we debug? • To debug methodically, we must resist the temptation to quick hypotheses, focusing rather on questions and observations • Iterating between questions and observations gathers the facts that will constrain future hypotheses • These facts can be used to disconfirm hypotheses! • How do we ask questions? • How do we make observations?
  • 12. Asking questions • For performance debugging, the initial question formulation is particularly challenging: where does one start? • Resource-centric methodologies like the USE Method (Utilization/Saturation/Errors) can be excellent starting points… • But keep these methodologies in their context: they provide initial questions to ask — they are not recipes for debugging arbitrary performance pathologies!
  • 13. Making observations • Questions are answered through observation • The observability of the system is paramount • If the system cannot be observed, one is reduced to guessing, making changes, and drawing inferences • If it must be said, drawing inferences based only on change is highly flawed: correlation does not imply causation! • To be observable, systems must be instrumentable: they must be able to be altered to emit a datum in the desired condition
  • 14. Observability through instrumentation • Static instrumentation modifies source to provide semantically relevant information, e.g., via logging or counters • Dynamic instrumentation allows for the system to be changed while running to emit data, e.g. DTrace, OpenTracing • Both mechanisms of instrumentation are essential! • Static instrumentation provides the observations necessary for early question formulation… • Dynamic instrumentation answers deeper, ad hoc questions
  • 15. Aside: Monitoring vs. observability • Monitoring is an essential operational activity that can indicate a pathologically performing system and provide initial questions • But monitoring alone is often insufficient to completely debug a pathologically performing system, because the questions that it can answer are limited to that which is monitored • As we increasingly deploy developed systems rather than received ones, it is a welcome (and unsurprising!) development to see the focus of monitoring expand to observability!
  • 16. Aggregation • When instrumenting the system, it can become overwhelmed with the overhead of instrumentation • Aggregation is essential for scalable, non-invasive instrumentation — and is a first-class primitive in (e.g.) DTrace • But aggregation also eliminates important dimensions of data, especially with respect to time; some questions may only be answered with disaggregated data! • Use aggregation for performance debugging — but also understand its limits!
  • 17. Visualization • The visual cortex is unparalleled at detecting patterns • The value of visualizing data is not merely providing answers, but also (and especially) provoking new questions • Our systems are so large, complicated and abstract that there is not one way to visualize them, but many • The visualization of systems and their representations is an essential skill for performance debugging!
  • 18. Visualization: Gnuplot • Graphs are terrific — so much so that we should not restrict ourselves to the captive graphs found in bundled software! • An ad hoc plotting tool is essential for performance debugging; and Gnuplot is an excellent (if idiosyncratic) one • Gnuplot is easily combined with workhorses like awk or perl • That Gnuplot is an essential tool helps to set expectation around performance debugging tools: they are not magicians!
  • 21. Visualization: Statemaps • Especially when trying to understand interplay between different entities, it can be useful to visualize their state over time • Time is the critical element here! • We are experimenting with statemaps whereby state transitions are instrumented (e.g., with DTrace) and then visualized • This is not necessarily a new way of visualizing the system (e.g., early thread debuggers often showed thread state over time), but with a new focus on post hoc visualization • Primordial implementation: https://github.com/joyent/statemap
  • 27. The hurricane’s butterfly • Finding the source(s) of pathologically performing systems must be thought of as debugging — albeit the hardest kind • Debugging isn’t about making guesses; it’s about asking questions and answering them with observations • We must enshrine observability to assure debuggability! • Debugging rewards persistence, grit, and resilience more than intuition or insight — it is more perspiration than inspiration! • We must have the faith that our systems are — in the end — purely synthetic; we can find the hurricane’s butterfly!