The Science Of Troubleshooting

THE SCIENCE OF
TROUBLESHOOTING
Adding Rigour to Chaos
Shawn Button

THE SCIENCE OF
TROUBLESHOOTING
Or “How We Actually Solve Problems,
maybe without even realizing it”
Shawn Button

To-Do DOING (1) DONE
Story of
a Team
Intro
Types of
Reasoning
Back to
the
Team
Scientific
Method
Deliberate
Process
These
Make It
Easier
Watch Out
For These
Other Uses
For
Experiments
Teaching
This
Bonus
Material!
Closing and
References

CUSTOMARY “WHO AM I?” SLIDE
Shawn Button
• Developer
• Agile Coach
• Lever 21
Pokémon Go
Trainer

STORY OF A TROUBLESHOOTING
PROBLEM

Story of
a Team
IntroTypes of
Reasoning
Back to
the
Team
Scientific
Method
Deliberate
Process
These
Make It
Easier
Watch Out
For These
Other Uses
For
Experiments
Teaching
This
Bonus
Material!
Closing and
References

TYPES OF REASONING
• Deductive
• Inductive
• Abductive

DEDUCTIVE REASONING
Starts with the assertion of a general rule and proceeds
from there to a guaranteed specific conclusion.
e.g. Math
If x = 4
And if y = 1
Then 2x + y = 9
e.g. All birds have feathers and swans are birds, so
swans have feathers.

INDUCTIVE REASONING
Begins with observations and proceeds to a generalized
conclusion that is likely, but not certain, in light of
accumulated evidence.
e.g. All swans that I have seen are white, therefore all
swans are white.

ABDUCTIVE REASONING
The generation of new ideas (hypotheses) to explain
observations. Begins with an incomplete set of
observations and proceeds to the likeliest possible
explanation for the set.

ABDUCTIVE REASONING
Examples:
Medical diagnosis: given this set of symptoms, what is
the diagnosis that would best explain most of them?
Criminal trial: given the evidence is the suspect more
likely innocent or guilty?
Swans get wet when they swim. That swan is wet,
therefore it was swimming, and therefore there must be
water near here.

ABDUCTIVE REASONING
Abductive reasoning is concerned with imaginative
reasoning, a process where new ideas or hypotheses
come into existence through observation.
We use abductive reasoning all of the time in order to
make sense of the world.
We build up a mental model of reality that is
constructed from hypotheses, which are based on
observations.

TYPES OF LOGICAL REASONING
Deductive Reasoning
General Rule Specific Conclusion
(true if rule is true)
Inductive Reasoning
Specific
Observations
General Conclusion
(may be true)
Abductive Reasoning
Incomplete
Observations
Best Prediction
(may be true)

THE SCIENTIFIC METHOD OF
TROUBLESHOOTING
Observe
Problem
Collect
Observations
Create
Hypotheses
Design
Experiment
Perform
Experiment
Evaluate
Results
Falsified, so
try another
hypothesis
Supported,
so refine
hypotheses
Our
Understanding
Of The System
Solution!

OBSERVE PROBLEM
Observe
Problem
Collect
Observations
Create
Hypotheses
Design
Experiment
Perform
Experiment
Evaluate
Results
Falsified, so
try another
hypothesis
Supported,
so refine
hypotheses
Our
Understanding
Of The System
Solution!

OBSERVE PROBLEM
• A Problem is behavior that you didn’t expect
• First thing to do it figure out how it should behave
• Focus on external behavior. What are all of the things we should
expect to see if it works.
• Ask: What input does it take? What output should it give? Etc.
• Write down the problem statement! If you can’t create a concise
description you probably don’t understand it well enough.
• Can we recreate the problem? If not we need to be able to in
order to proceed.

COLLECT OBSERVATIONS
Observe
Problem
Collect
Observations
Create
Hypotheses
Design
Experiment
Perform
Experiment
Evaluate
Results
Falsified, so
try another
hypothesis
Supported,
so refine
hypotheses
Our
Understanding
Of The System
Solution!

COLLECT OBSERVATIONS
• Now collect some evidence. Write down any observations about
what’s happening in the system.
• What actually happens when you recreate the problem? What
inputs did you use? What are the outputs. Are there log files? Can
we see other impacts, for example in a database or service.
• Spend some time digging here, because the more you spend the
better the quality of the hypotheses you are able to create in the
next step.

CREATE HYPOTHESES
Observe
Problem
Collect
Observations
Create
Hypotheses
Design
Experiment
Perform
Experiment
Evaluate
Results
Falsified, so
try another
hypothesis
Supported,
so refine
hypotheses
Our
Understanding
Of The System
Solution!

CREATE HYPOTHESES
• Think of as many possible causes for the observed behaviour as
you can.
• These are your hypotheses. A hypothesis is an attempt to
understand and explain what is happening.
• Hypotheses can be wrong, and many will be.
• Write everything down!
• If you can’t form clear hypotheses, you might just not know
enough, and need to gather some more data.
• There’s nothing wrong with using Stack Overflow / Google to help
form hypothesis!

Ease to Perform
Guess
of
Value
WHICH HYPOTHESIS SHOULD WE
TEST?

DESIGN AND CARRY OUT
EXPERIMENTS
Observe
Problem
Collect
Observations
Create
Hypotheses
Design
Experiment
Perform
Experiment
Evaluate
Results
Falsified, so
try another
hypothesis
Supported,
so refine
hypotheses
Our
Understanding
Of The System
Solution!

DESIGN AND CARRY OUT
EXPERIMENTS
• Before you start your experiment write down what you are doing,
what you expect to see, and what it means if the experiment fails.
• It could be a description of the metrics you’re going to look at,
code or config you’re going to change, query you’re going to run
• An experiment could also just be gathering data or measurements,
for example from log files, metrics, other visibility tools you have.
For example “if we look in the production logs we should see this
log statement before the error”
• One way of proving a hypothesis is by looking at the code, or
stepping through a debugger.
• Again, write it down!
• E.g. If I change this line in the config file, then I expect that this
error should no longer appear in the log.

EVALUATE RESULTS
Observe
Problem
Collect
Observations
Create
Hypotheses
Design
Experiment
Perform
Experiment
Evaluate
Results
Falsified, so
try another
hypothesis
Supported,
so refine
hypotheses
Our
Understanding
Of The System
Solution!

EVALUATE RESULTS
• What happened when you performed the experiment?
• For complicated debugging it helps to keep a record of the results,
for example save screenshots or copies of log files.
• Did the results match your prediction? If not, your hypothesis
must be false (unless you made some other mistake).
• Do the results suggest a new hypothesis, or refinement to existing
hypotheses?
• If you changed something (e.g. code or config) and the results
aren't what you expected, you should undo the change.

EVERY EXPERIMENT BUILDS
UNDERSTANDING
Observe
Problem
Collect
Observations
Create
Hypotheses
Design
Experiment
Perform
Experiment
Evaluate
Results
Falsified, so
try another
hypothesis
Supported,
so refine
hypotheses
Our
Understanding
Of The System
Solution!

EVERY EXPERIMENT BUILDS
UNDERSTANDING
• If you are able to solving a problem without learning, the problem
is just going to reoccur later.
• A validated hypothesis doesn’t necessarily mean you’ve fixed the
problem. It might just be that you’ve learned something about the
system. You might need to go through many experiments loops to
learn enough the find the problem.
• The scientific method is actually a structured knowledge
acquisition process. Solving problem is a happy fringe benefit!

EVENTUALLY, A SOLUTION!
Observe
Problem
Collect
Observations
Create
Hypotheses
Design
Experiment
Perform
Experiment
Evaluate
Results
Falsified, so
try another
hypothesis
Supported,
so refine
hypotheses
Our
Understanding
Of The System
Solution!

EXERCISE!
As a table you are going to try out the scientific
method.
Pick a scenario from the back of the handout, or
better, someone volunteer a real problem they
have or had.

EXERCISE!
method.
On the big sheet on the table make sections for:
Problem Statement
Observations
Hypotheses

EXERCISE!
method.
Together come up with each of these.
For hypotheses come up with the experiment
you will run, and how you will know if the
hypothesis is validated.

EXERCISE!
Observe
Problem
Collect
Observations
Create
Hypotheses
Design
Experiment
Perform
Experiment
Evaluate
Results
Falsified, so
try another
hypothesis
Supported,
so refine
hypotheses
Our
Understanding
Of The System
Solution!

CONSCIOUS COMPETENCE
AGAIN
Conscious
Competence (5)

“First I had to become conscious while programming. I had been
programming for years… and I was astonished to discover that, even
though programming decisions came smoothly and quickly to me, I
couldn’t explain why…. The first step… was slowing down long
enough to become aware of what I was thinking, to stop pretending
that I coded by instinct”
WHY SO DELIBERATE WITH THE
PROCESS?

“I’m not a great programmer. I’m just a
good programmer with great habits”
WHY SO DELIBERATE WITH THE
PROCESS?

• Your memory isn’t as good as you think it is!
• Take notes in a notebook, whiteboard,
stickies, or a chat tool
• Provides clarity on the problem, hypotheses,
experiments and learning
• Avoids repeating experiments, or missing
connections
WRITE EVERYTHING DOWN!

The speed of running experiments is key. The
faster you go through the experiment loop the
faster you learn.
EXPERIMENT FASTER

• Isolate the problem in a smaller app that you
can run in isolation (Unit test tools like xUnit
are fantastic for this)
• Focus on faster experiments, for example
ones that only use logs from previous runs
• Tools help here! Become an expert in your
debugger, or in some languages, interactive
consoles
EXPERIMENT FASTER

Often the problem is mired in a lot of
complexity. Can you find a way to recreate
the problem in a less complicated fashion?
• Extract the offending code into it’s own
method/class
• Refactor to clean it up
• Comment out extraneous lines
• Once you run your experiment you likely
want to revert these changes
SIMPLIFY

Empathize with the future person who has to
debug the problem (who might be you).
• Design in the ability to run experiments.
• Good, modular design lets you run things in
isolation. Single Responsibility Principle!
• Have unit tests. Use Test Driven Development.
• Use informative logging statements.
• Keep your logs clean.
• Good error messages beat documentation.
DESIGN FOR DEBUGGING

• Write Everything Down!
• Running Experiments Faster
• Simplify The System
• Create Systems That Support Troubleshooting
• Be Deliberate About The Process and Practice
THINGS THAT MAKE THIS EASIER
RECAP

• If you skip steps or try to hurry this, you will
immediately slow down.
• If you are a manager protect your developers
from pressure (and blame) when they are
debugging a critical problem.
• Have someone to communicate status to
anxious management types and customers,
to allow the devs time to by systematic
• Take a break if you get stuck
PRESSURE AND RUSHING SLOWS
YOU DOWN

If you change more than one thing at a time it
is very hard to evaluate the results of your
experiment.
Take as small steps as you can in order to
learn about the system.
ONLY CHANGE ONE VARIABLE AT
A TIME

You will unconsciously want to problem to have a
certain cause, so pick or exclude valid hypotheses.
• You have “gut feelings” or assumptions about
what is wrong
• There is a part of the system you distrust or you
often see problems in a certain area
• We have a tendency to think a problem is
someone else’s fault
BIAS, ASSUMPTIONS, BLAME

Before you blame someone else, run an
experiment that conclusively proves it is someone
else’s problem. Then provide them with the
BLAME
results of that
experiment to help them
debug the problem.

• Sometimes a problem can have more than
one root cause.
• Sometimes failures can interact with each
other (or even cancel!)
• Sometimes a tested supported hypothesis
can lead you down the wrong path.
COMPLEX FAILURES

• Don’t just randomly run experiments!
• Hypotheses are created based on
evidence.
• New hypotheses should be guided and
refined by your increasing knowledge of
the system you are investigating.
TRIAL AND ERROR

• Pressure and rushing slows you down
• Changing Multiple Variables confounds you
• Bias, Assumptions, Blame
• Complex Failures
• Trial and Error
WATCH OUT FOR

If you are wanting to learn about a new
system, the same technique applies!
State your hypothesis about how the system
works, and then run your experiment.
You build up a mental model of the system
through structured experiments.
KNOWLEDGE ACQUISITION

Sometimes, it’s just easier to google the
problem. Don’t think this replaces all of the
90 times a day you search Stack Overflow.
STACK OVERFLOW AND GOOGLE

WHERE IS THE CREATIVITY?
Scientific method of debugging is a
creative process, as you create
hypotheses to test. Creativity helps.
Sometimes framing the right hypo takes
creativity and ingenuity.
More people helps.
Thinking outside of the “box”
Does this feel dry and mechanical?
Remember Abductive reasoning is a creative
process.
Finding the right hypothesis and drawing
connections between experiments takes
creativity, ingenuity, experience.
More people helps (Mobbing ftw!)

• Knowledge Acquisition
• When Not To Use The Scientific Method
• Where Is The Creativity?
OTHER THINGS

PAIRING AND MOBBING
The best way to teach this is to demonstrate
it in action.
When a problem happens get everyone in a
room and walk through the process.
Use a big white-board or sticky notes on a
wall to record your notes.

TOYOTA KATA
More people helps.

TROUBLESHOOTING KATA
QUESTIONS
More people helps.
Questions About The Problem
1. What do we know about the
problem?
2. Is this enough to form
hypotheses?
3. What are our hypotheses?
Are we missing any?
4. Which hypothesis is most
likely to advance us towards
a solution?
Questions For The Experiment
1. What experiment can we run
to test our hypothesis?
2. What do we expect to see?
3. What were the results? Did
we see what we expected?
4. What have we learned about
the system?

TEACHING THIS
More people helps.
• Pairing and Mobbing is an excellent way to
share this process and discipline
• Use the Troubleshooting Kata questions to
start people thinking in a scientific way

BONUS! EXPERIMENTAL MINDSET
Once you get into the habit of thinking about
things in terms of experiments you will find it is
applicable everywhere.
Being more explicit about experimentally
testing what you learn and believe makes you a
more powerful thinker.

USEFUL REFERENCES
“Debugging With Scientific Method” - @stuarthalloway talk at Clojure/conj -
https://www.youtube.com/watch?v=FihU5JxmnBg
“The Scientific Method of Troubleshooting: A FutureTalk with Blithe Rocher” -
https://blog.newrelic.com/2015/08/19/futuretalk-scientific-method-blithe-rocher/
“Scientific Debugging” - http://yellerapp.com/posts/2014-08-11-scientific-debugging.html
Game Coding Complete - Mike McShaffry
“Scientific debugging: Finding out why your code is buggy” -
http://www.embedded.com/design/debug-and-optimization/4418635/Scientific-debugging--
Finding-out-why-your-code-is-buggy---Part-1
“Each necessary, but only jointly sufficient” -
http://www.kitchensoap.com/2012/02/10/each-necessary-but-only-jointly-sufficient/
Implementation Patterns – Kent Beck

CREDITS
“Pair Programming” - Lisamarie Babik -
https://commons.wikimedia.org/wiki/File:Pair_programming_1.jpg
“Sherlock” - http://today.uconn.edu/wp-content/uploads/2014/07/sherlock-holmes-basil-
rathbone.jpg
”Life Is A Mastermind Game” - https://adsoftheworld.com/taxonomy/brand/mastermind
The Four Stages of Competence - https://focuspocusnow.files.wordpress.com/2013/02/4-
stages-of-competence.png
”Lab Puppy” - https://pixabay.com/en/puppy-labrador-purebred-retriever-1082141/
“Springfield Blame” - https://c2.staticflickr.com/8/7473/15650030866_f236377785.jpg

FIND US HERE!
Shawn Button
shawn@leanintuit.com
@shawnbut

Story of
a Team
Intro
Types of
Reasoning
Back to
the
Team
Scientific
Method
Deliberate
Process
These
Make It
Easier
Watch Out
For These
Other Uses
For
Experiments
Teaching
This
Bonus
Material!
Closing and
References
Thanks for Coming!
Catch ‘Em All!

The Science Of Troubleshooting

Recommended

Recommended

More Related Content

Similar to The Science Of Troubleshooting

Similar to The Science Of Troubleshooting (20)

Recently uploaded

Recently uploaded (20)

The Science Of Troubleshooting

Editor's Notes