The Unrealized Role of:
Monitoring & Alerting
@jasonhand | VictorOps | #AllDayDevOps
@jasonhand | VictorOps | #AllDayDevOps
@jasonhand | VictorOps | #AllDayDevOps
THE UNREALIZED
ROLE OF:
Monitoring
& Alerting
@jasonhand | VictorOps | #AllDayDevOps
JASON
HAND
DevOps Evangelist
VictorOps
@jasonhand | VictorOps | #AllDayDevOps
@jasonhand | VictorOps | #AllDayDevOps
2015
MONITORING
SURVEY@jasonhand | VictorOps | #AllDayDevOps
@jasonhand | VictorOps | #AllDayDevOps
WHY ARE YOU COLLECTING THIS DATA?
NOTE: You may choose more than one
▸ Performance analysis and trending
▸ Fault and Anomaly detection
▸ Capacity Planning
▸ A/B Testing
@jasonhand | VictorOps | #AllDayDevOps
THE RESULTS
NOTE: Respondents may have chose more than one
▸ Performance analysis and trending - 63%
▸ Fault and Anomaly detection - 53%
▸ Capacity Planning - 45%
▸ A/B Testing - 11%
@jasonhand | VictorOps | #AllDayDevOps
Tyranny of the
S.L.A.(Service Level Agreement)
@jasonhand | VictorOps | #AllDayDevOps
HIGH
AVAILABILITYPrediction & Prevention
@jasonhand | VictorOps | #AllDayDevOps
@jasonhand | VictorOps | #AllDayDevOps
@jasonhand | VictorOps | #AllDayDevOps
THAT'S IMPORTANT
... BUT ...@jasonhand | VictorOps | #AllDayDevOps
@jasonhand | VictorOps | #AllDayDevOps
BUSINESS
OBJECTIVES?@jasonhand | VictorOps | #AllDayDevOps
HAPPY CAMPER
@jasonhand | VictorOps | #AllDayDevOps
CUSTOMERSwant more than just
99.999% UPTIME@jasonhand | VictorOps | #AllDayDevOps
@jasonhand | VictorOps | #AllDayDevOps
WHERE'S THE
INNOVATION?
@jasonhand | VictorOps | #AllDayDevOps
HOW IMPORTANT
IS
Learning & Innovation?
@jasonhand | VictorOps | #AllDayDevOps
@jasonhand | VictorOps | #AllDayDevOps
@jasonhand | VictorOps | #AllDayDevOps
@jasonhand | VictorOps | #AllDayDevOps
@jasonhand | VictorOps | #AllDayDevOps
The result of underutilizing monitoring & alerting
is that the IT department and the organization have
no chance to...
LEARN,
IMPROVE, OR
INNOVATE.@jasonhand | VictorOps | #AllDayDevOps
CONTINUALLY UNDERSTANDING & RESPONDING
TO THE FEEDBACK
from
monitoring, logging, & alerting
allows you to use information about events in the past to drive future
actions.
@jasonhand | VictorOps | #AllDayDevOps
@jasonhand | VictorOps | #AllDayDevOps
@jasonhand | VictorOps | #AllDayDevOps
It's not just about
PREDICTION
& PREVENTION
@jasonhand | VictorOps | #AllDayDevOps
RESPOND &
REPAIR
...QUICKLY
@jasonhand | VictorOps | #AllDayDevOps
NOPE
@jasonhand | VictorOps | #AllDayDevOps
MTTRRather Than
MTBF@jasonhand | VictorOps | #AllDayDevOps
FAILURE IS
INEVITABLE
@jasonhand | VictorOps | #AllDayDevOps
US·ER
/ˈYOOZƏR/
DISTRIBUTED FAULT INJECTION TEST SUITE FOR
PRODUCTION.
credit: Leon Fayer (@papa_fire)
@jasonhand | VictorOps | #AllDayDevOps
SUCCESS
is a result of
FAILURE@jasonhand | VictorOps | #AllDayDevOps
UNDERSTAND
LEARN
INNOVATE@jasonhand | VictorOps | #AllDayDevOps
RE·SIL·IENT/RƏˈZILYƏNT/
The ability to resist, absorb, recover from or successfully adapt to
adversity or a change in conditions
@jasonhand | VictorOps | #AllDayDevOps
CHANGE
can cause failure
but innovation requires
CHANGE
@jasonhand | VictorOps | #AllDayDevOps
CONFLICT
@jasonhand | VictorOps | #AllDayDevOps
CHANGEREQUIRED@jasonhand | VictorOps | #AllDayDevOps
Without deviation from the norm,
progress is not possible
— Frank Zappa
@jasonhand | VictorOps | #AllDayDevOps
What Did You
LEARNFrom the Recovery Efforts?
(including monitoring & alerting)
@jasonhand | VictorOps | #AllDayDevOps
POSTMORTEMS / LEARNING REVIEWS:
Stories of:
WHAT TOOK PLACE
leading up to & during
the disruption & recovery efforts
@jasonhand | VictorOps | #AllDayDevOps
WHO WAS
INVOLVED?@jasonhand | VictorOps | #AllDayDevOps
WHAT DID THEY
SEE?@jasonhand | VictorOps | #AllDayDevOps
WHAT WAS
SAID?@jasonhand | VictorOps | #AllDayDevOps
WHAT
ACTIONSWERE TAKEN?
jhand.co/chatopsbook
@jasonhand | VictorOps | #AllDayDevOps
HOW DO
events & actions
CORRELATE
OVER TIME?@jasonhand | VictorOps | #AllDayDevOps
5 Why's@jasonhand | VictorOps | #AllDayDevOps
5 Why's@jasonhand | VictorOps | #AllDayDevOps
WHAT IS THE "cause"
OF THE PROBLEM?
Root Cause is ...
@jasonhand | VictorOps | #AllDayDevOps
OUR...
obsession with
"Root Cause"
@jasonhand | VictorOps | #AllDayDevOps
ASKING "WHY"
.. leads to ..
BLAME@jasonhand | VictorOps | #AllDayDevOps
BLAMING
LEADS TO..
operators hiding relevant & important
information
@jasonhand | VictorOps | #AllDayDevOps
We must
BELIEVEthat our operators are doing their best given the
constraints of the "system"
@jasonhand | VictorOps | #AllDayDevOps
"We are here to"
LEARNFrom Failure
(and success)
@jasonhand | VictorOps | #AllDayDevOps
RATHER THAN ..
@jasonhand | VictorOps | #AllDayDevOps
AVOIDFAILURE@jasonhand | VictorOps | #AllDayDevOps
WHAT'S THE
STORY?@jasonhand | VictorOps | #AllDayDevOps
INNOVATE
Learning from both success & failure
to develop & implement
small incremental improvements
is critical.
@jasonhand | VictorOps | #AllDayDevOps
MONITORING &
ALERTINGHelps us understand the story in greater detail
@jasonhand | VictorOps | #AllDayDevOps
LEARNING
ORGANIZATION
@jasonhand | VictorOps | #AllDayDevOps
Learning does NOT come from
READING
&
LISTENING@jasonhand | VictorOps | #AllDayDevOps
Learning comes from
DOING@jasonhand | VictorOps | #AllDayDevOps
Real Learning comes from:
OBSERVING
ORIENTING
DECIDING
ACTING
John Boyd's OODA Loop
@jasonhand | VictorOps | #AllDayDevOps
Example:
LEARNING TO PLAY THE
DOBRO GUITAR@jasonhand | VictorOps | #AllDayDevOps
@jasonhand | VictorOps | #AllDayDevOps
LEARNING
@jasonhand | VictorOps | #AllDayDevOps
WHY?
Go from knowing...
to understanding...
to learning
NOTE:
(Requires making mistakes)
@jasonhand | VictorOps | #AllDayDevOps
We will trade some uptime in exchange for innovation
-Dave Hahn (Netflix)
DevOpsDays Boise 2016
@jasonhand | VictorOps | #AllDayDevOps
SHIFT OUR GAZE
from:
MAINTAINING
& PROTECTING
@jasonhand | VictorOps | #AllDayDevOps
LEARNING
Which leads to...
IMPROVING
& INNOVATING
@jasonhand | VictorOps | #AllDayDevOps
WE INCREASE VALUE OF:
- Monitoring & Alerting
- IT teams
- Products & Services
- Organization
@jasonhand | VictorOps | #AllDayDevOps
HYPOTHESIZE
EXPLORE
STRETCH
EXPERIMENT
FAIL
LEARN
Try Again
@jasonhand | VictorOps | #AllDayDevOps
@jasonhand | VictorOps | #AllDayDevOps
LEARNING & INNOVATING
leads to uncovering new ways of
BUILDING, DEPLOYING, AND MAINTAINING
SOFTWARE & INFRASTRUCTURE
Which leads to...
@jasonhand | VictorOps | #AllDayDevOps
RESILIENTSYSTEMS@jasonhand | VictorOps | #AllDayDevOps
The
By-product
of a highly
RESILIENT
system is ...
@jasonhand | VictorOps | #AllDayDevOps
@jasonhand | VictorOps | #AllDayDevOps
HIGHLY
AVAILABLE
SYSTEM@jasonhand | VictorOps | #AllDayDevOps
THE UNREALIZED
ROLE OF:
Monitoring
& Alerting is ....
@jasonhand | VictorOps | #AllDayDevOps
LEARNING
&
INNOVATION@jasonhand | VictorOps | #AllDayDevOps
THANK
YOUBe Victorious!
@jasonhand | VictorOps | #AllDayDevOps
@jasonhand | VictorOps | #AllDayDevOps
Monitoring Survey: https://kartar.net/2015/08/monitoring-
survey-2015---metrics/
Firefighter: https://www.learyfirefighters.org/wp-content/uploads/
2013/09/cover-slide-1.jpg
Mechanic: https://upload.wikimedia.org/wikipedia/commons/4/4b/
Flickr_-_Israel_Defense_Forces_-
_Airplane_Technician,_March_2010.jpg
Gnome Plan: http://www.nerdfitness.com/wp-content/uploads/
2012/04/Screen-Shot-2012-03-30-at-3.15.38-AM-1024x7591.jpg
NOC: https://upload.wikimedia.org/wikipedia/commons/0/03/@jasonhand | VictorOps | #AllDayDevOps
References:
Kodak: http://file.answcdn.com/answ-cld/image/upload/v1/tk/
brand_image/b59911fc/
91d6e71d30a0878dfe3cb30a22751cb874a3ea8c.jpeg
VW Camper: https://upload.wikimedia.org/wikipedia/commons/d/d7/
VW_Camper.jpg
Blockbuster: https://jordanandeddie.files.wordpress.com/2013/11/
blockbuster-feature.jpg
Borders: http://smashingtops.com/wp-content/uploads/2012/06/
borders_logo1.jpg@jasonhand | VictorOps | #AllDayDevOps
Chained Hands: https://www.google.com/url?
sa=i&rct=j&q=&esrc=s&source=images&cd=&ved=0ahUKEwjgrNCD
h5TMAhXJs4MKHaoZDssQjBwIBA&url=http%3A%2F
%2Fwww.publicdomainpictures.net%2Fdownload-picture.php
%3Fadresar%3D50000%26soubor%3Dhands-in-chains.jpg%26id
%3D40426&bvm=bv.119745492,d.amc&psig=AFQjCNFIdnDPzSqiLA-
znIW5SCTCUHhqEw&ust=1460926880336203
Inevitable: http://vignette4.wikia.nocookie.net/matrix/images/5/51/
SMITH.png/revision/latest?cb=20110214092002
Bulb: https://smhttp-ssl-37293.nexcesscdn.net/media/catalog/
@jasonhand | VictorOps | #AllDayDevOps
scoreboard/1000/Safety-Awareness-Sign-DSE-195271000.gif
Stewie:
http://chroniclesofredmark.com/wp-content/uploads/2014/01/
Stewie.gif
change: http://i.imgur.com/EQyC6N3.gif
Hard drive: https://i.imgur.com/pWsKSEf.gif
Change: https://farm6.staticflickr.com/
5208/5270199049df99b234e9od.jpg
Value: https://d13yacurqjgara.cloudfront.net/users/6437/
screenshots/1405551/value-cropped.gif
@jasonhand | VictorOps | #AllDayDevOps

The Unrealized Role of Monitoring & Alerting w/ Jason Hand