SlideShare a Scribd company logo
1 of 106
Download to read offline
5 lines I couldn’t draw
Hi everybody
I’m dave
dave@librato.com
@davejosephsen
github: djosephsen
I’m Dave and I work on the Ops team at Librato. In fact you’ve caught me in a bit of a transitional period because
I’ve recently decided to move back to ops after spending two years as Librato’s developer evangelist, which has
been a fascinating role that’s given me the opportunity to really branch out learn many new things.
Thought Leader Power Moves
For example Here’s a shot of me doing an all male panel, which is just one of the many thought-leader power
moves I’ve perfected over the last couple years in my former role as developer evangelist.
Thought Leader Power Moves
I can also do that faux-earnest, touching my fingers together while wearing a blazer and pontificating at you thing.
That’s totally in my repertoire so if you’re ever in doubt of the validity of my argument I can touch my fingers
together and become super reasonable looking
• “Resource”:
•Human-being who works here
!
• “Lead”:
•Human-being who doesn’t work here
!
• “Content Marketing:
•Annoying people on twitter
!
• “Engagement”
•Tricking people into talking to you
I’ve penetrated the marketing team’s vernacular, so yeah.. that took two years, but I can definitely sit in those
meetings now and.. I mean I pretty much know mostly what’s going on I think.
Thought Leader Power Moves
git push -f origin master!
I can force-push master from a vendor booth. That was a huge achievement get in this role. By the way I’ve heard if
you force push master from an all male panel they have to make you CTO. Pro tip. But yeah the vender booth is like
a second home to me now. I’ve done quite a bit of venderboothing over the last few years
Graphing Na
Venderboothing
Here we are at twilio signal a few months ago. And something happened at signal I want to tell you about, because
it happens a lot in the course of my venderboothing endeavors where
Venderboothing
I’ll be there in the booth, you know, my fingernails lightly resting against one another. And I’ll be engaging with
leads, and filling our funnel top with branding, and this neckbeardy dude
will kind of slide up, and lurk. He’ll just stand there glaring at me as I do my booth dance, impossible to ignore, like
He’s like, heavy. I mean not literally heavy I’m not body-shaming him, I just mean he’s like… laden with discontent
you know?
Graphing Na
BLARG ANOMOLY DETECTION!!!
and after listening for a while he’ll just spontaneously interrupt by blurting out something like “WHAT ABOUT
ANOMOLY DETECTION!”. And by the way in every case this has been what my wife calls a “catestrophic
digression”,
Graphing Na
like this interruption will be so awkward and so rude that it will send people scurrying away from the booth out of
either embarrassment or genuine fear for their own safety. And I’ll want to run away too but not running away from
incredulous neck beards is another skill I’ve developed so instead
Graphing Na
I’ll don my thought leader cap and be like “what an informed and thought provoking question, Let me ask you,
what kind of monitoring tools are you using today?”
and inevitably, he’ll launch into this apocalyptic tale of woe, wherein he will describe the faustian hell in which he is
currently trapped. Like his companies product
Perl!
is, I’m not making this up, a bunch of perl scripts that
OTHER PEOPLES Perl!
some consultant wrote back in 2006, and for monitoring they have these
Graphing Na
MORE Perl!
other perl scripts that a different consultant wrote in 2008 and those perl scripts are watching those first perl scripts
Graphing Na
WINDOWS!!
And by the way they all run on Windows so it’s not just perl
Graphing Na
ACTIVESTATE PERL!
but active state perl, and these active state perl scripts
EMAILZ!
send emails when unexpected things happen or error states are detected
Graphing Na
MAPI EMAILZ!!!
and they’re using… MAPI on windows XP to do this,
Graphing NaTHOUSANDS OF
MAPI EMAILZ!!!
And of course they’re sending 5000 emails per day because unexpected state is basically always happening and of
course nobody is paying any attention to the emails.
Graphing NaTHOUSANDS OF
MAPI EMAILZ!!!
and usually by this point he feels awful and I feel awful so I’ll say something “like wow, that sounds awful it must be
a really frustrating and stressful environment for you, like I’m sorry to hear it. And he’ll be all”
Graphing Na
it’s fine…. and I’ll be like. Man. You sure? because that sounds really bad, like I sincerely want to just hug you right
now. Would you like a hug?
Graphing Na
And he’ll be like no really it’s fine, I don’t need a AAAHHH oh my god hug me. And we’ll hug it out and he’ll cry a
little bit, and then I’ll be like. You know what you should do? You should vi that .qmail-root file mister neckbeard.
You know?
Graphing Na
| cat > /dev/null && echo ‘emailz,1,g’ | /usr/bin/statsd
Throw something like this in there. Just dev null those messages and count them instead. I’m not saying that’s your
permanent solution or anything
Graphing Na
but i mean you don’t even know what that signal LOOKS like. You might be able to alert on simple message
volume, or the derivative, or maybe just seeing that line will TEACH you something about the behavior or rhythm of
your system.
Graphing Na
Ya think?
And he’ll be like? “fa foa foa You think?”
Graphing Na
DO EEET!
And I’ll be like heck yeah mister neckbeard. Go giterdone, and he’ll leave happy and with a lighter heart. It’s a funny
story. But it’s also a true story. I’ve had this conversation at least 20 times. And today surrounded by all of you and
my librato teamies it’s easy to forget, but if I’m being honest?
This was me one job ago
I was this neckbeard. This was me. Not a long time ago either but like one job ago, I was him. And I more or less
knew everything about monitoring that I know now, and yet I couldn’t .
Graphing Na
#1
draw this line, any more than he could. And I can hear you like Really dave emails per second? I mean yes, of course
I could draw a line of emails per second, but then so can he. What neither of us could do is make that cognitive
leap of applying monitoring tools as a means of understanding system behavior independent of alerting. So this is
line number one I couldn’t draw.
Graphing NaSpoiler alert:
There will be 4 more
And I come across these pretty commonly these days where I’ll see some data that we’re using internally and be
like man.. I never could have drawn one job ago and I just kind of thought it’d be interesting to explore the reasons
why. Live on stage in front of a room full of visionaries.
I was carrying a misapprehension
about what monitoring was and
whom it was for
Because in every case the problem wasn’t a lack of technical aptitude, it was wrapped up in my beliefs and
expectations. In this particular case, the problem . The thing that’s causing our cognitive dissonance is that these
perl scripts are sending alerts already. So to us— to mr neck beard and I, it seemed like monitoring was already in
place. Box checked. Monitoring done.
Because to us monitoring the thing that made alerts happen. We had no means of describing an undertaking called
monitoring that was meaningfully discreet from alerting. That’s why monitoring is an ops thing. It’s about uptime. So
obviously ops owns it. So as an ops person the pattern was that we would install or inherit this thing
this, guard-dog-like entity. I would tell it where to sit, and it would bark whenever it felt like something wasn’t right.
And it was great at barking. It’d bark all day and night bark bark bark bark bark. And that sucked. so I thought,
maybe I should train it. So I’d put in a ticket to train it
Squirrel!
ZOMG SQUIRREL!!!
It’s RIGHT THERE!
YOUGUISE? SQUIRRELL!
but I’d never work the ticket because it was literally nobody’s priority but mine. I’d feel guilty if I worked on it
because it seemd self indulgent or I’d feel guilty if I didn’t because dogs sitting over there barking at squirrels all
night and it’s just embarrassing
But what about
squirrels?
and maybe eventually I’d get some time to take it aside and be like listen. Thread contention? Bark. Ice cream
trucks? NO BARK. But no matter what I told it, was never something that could help me out with stuff like these
perl emails because
Oh sure. Blame
the puppies Dave
ultimately our relationship was Prescriptive in the wrong direction. The dog was always telling me what I was
interested in, so the best we could ever hope to achieve was this on going negotiation about what to bark at how
much barking was enough barking, it always became about the barking
Monitoring is not FOR alerting
Here are two important things I no longer believe. These are the things that I think makes me different from mister
neckbeard. First, I don’t believe monitoring is for alerting. It’s not about uptime.
Nobody OWNS monitoring
Next, monitoring is neither my responsibility nor is it my burden. It’s not mine. not my pet. It’s a tape measure. It’s a
tape measure that I get to share with every engineer I work with.
Ops owns
Monitoring
Everyone
owns
Monitoring
I think if you’ve been listening to the real speakers, basically all the ones who aren’t me, you’ll find that this is the
probably most important underlying belief that differentiates them from mister neckbeard. People who run
effective monitoring infrastructure
believe we all get to ask questions. We all get to measure things. Not just ops, not just dev, not just DBA,
everybody who cares, gets to measure, and we all get to use the same tape measure, and it’s perfectly reasonable
to expect to get accurate, timely answers to our questions, and that’s what MONITORING is for.
Monitoring is FOR asking questions
It’s the infrastructure that makes it possible for everyone to understand system behavior. That’s what monitoring is
for me now, today. That’s how it works. Not because I installed or bought some particular collection of tools or
learned about percentiles, but exactly because my expectations have changed. Does that make sense?
You might be fascinated
with anomaly detection
because your input
signal sucks?
What If I Told You:
And mark was right when he said tools matter, but the tools are there today and yet many still suffer. Like you can
build the metrics infrastructure you need right now, that’s hard, and expensive but possible. Or you can buy it, and
that’s easier and expensive but possible. But to make either of those actually work, you still need to change the
people. That’s a lot harder. No combination of bleeding edge tools, no amount of fancy anomaly detection is going
to save mister neck beard until he understands that monitoring is not for alerting, and that measuring things is
everyone’s job.
Complexity Isolates
And Mr Neckbeard has another problem too. He’s embracing complexity. See? To him, those emails are his burden
to bear, his lot in life. They are the hand grenade upon which he will jump to save us all. And that belief isolates
him. It mires him in complexity, and he believes that’s fine. Let me show you what I mean..
#2
Here’s line number two that I couldn’t draw. And I can hear you saying Dave, that isn’t even a line. Like, you
literally had one job dude and the first line was disappointing and this isn’t even a line. But the line you aren’t
seeing here is actually a REALLY important … lack of a line.
because what you aren’t seeing here is the number of people currently using the Librato API who are being
throttled. So slight digression for context this is a pretty common problem when new users are wiring us up for the
first time and what things do they send?
All the things! Cheslock knows all about that you can ask him.. So rather than surprise people with a million dollar
bill, We catch unlikely new ingest throttling and we’ll shoot an email or whatever that’s like hey um, maybe dial that
back unless you actually want to pay us the GDP of uraguy every month.
and a whole bunch of metrics like this are physically mounted to the wall next to our support team, because they’re
the ones who are going to be there to help the user understand why they’re suddenly getting http500’s but the
interesting part is that these weren’t created for support. These metrics in fact, were originally put in place by the
engineer who implemented throttling to understand what that signal looked like
And this made me wonder like, how did this happen. How did first level support begin repurposing API metrics?
With whom shall I share
my bounty of hard-won
metric data?
Was there some cuddly API engineer who, in a spontaneous bout of altruism went to go
devops unicorn cuddle with the support team and make rainbow metrics babies of team spirit? That’s amazing, I
want to meet these api engineers, so I went to go talk to them and
they seemed like typical software engineers who exhibit the typical demeanor and mannerisms that one expects
software engineers to manifest. So yeah long story short I think what’s happening here
Cynefin
is something like emergent cynefin.
and if you’re not familiar, cynefin is a framework that helps us make sense of complexity
The idea is that you categorize the complexity you’re dealing with, and then you attempt to move from whatever
category you’re in, to the next less complex category until you hit
obvious right here, which you can see is sort of an uphill climb
Things you need to move:
• Control
•Understanding
•Standardization
but my time at Librato has convinced me that cynefin can be an emergent property of effective monitoring systems.
By which I mean effective monitoring just sort of organically provides you a lot of the stuff you need to move
toward obvious
And I can hear you like really dave? emergent cynefin properties? You should have stuck with unicorn babies of
team spirit. Like do you even know what you sound like when you say shit like that? which, yes, I hear myself
Cynefin
(no, for reals tho)
I mean the processes line up pretty well. What you need to climb the cynefin ladder is pretty much what you give a
decent telemetry system to people who understand that their job is to measure things.
And this is a perfect example. Our support team was able to move from a very opaque and chaotic form of
complexity straight to obvious by repurposing monitoring data from another team, and today my friend Nik on the
support team can look up at that very real, very physical wall and say
“Behold. Throttled users!”, and go talk to them about it. That’s a first-level support team that organically
understands the concept of http return codes, and services oriented architecture, and API backpressure. Nobody
wrote them a manual for that. That’s Cynefin at work.
I could never draw lines like this… not-line. And the reason I couldn’t draw this not-line was because like mister
neck beard I thought that embracing complex things 5000 janky perl emails was my job. I thought complexity was
my lot in life,
Me dissecting somebody’s
javascript circa 2003
I was totally the guy who would sit down with that incomprehensible bowl of spagetti code that some mean-
spirited consultant wrote in 1997 and I’d be like I hate not understanding this. “I’m not leaving until this is
understood. I’m not going to a meeting, I’m not going to lunch, I’m not going home”. And managers would come
looking for me like did dave show up today. and my teamies would have to be like
don’t bug him he’s dissecting some janky code. This is a real picture, this is my boss steve at IBM global services
bringing me a sandwich. So For the millenials in the audience, this is actually what team spirit looked like in
corporate america in the late 90’s. But then once I had it figured it out I’d become the owner of that janky perl
forever. The only person who ever understood it and then people would be like
yo dave, that janky perl thing is broke again. Right? They’d dump it on me when it broke and that’s perfectly
rational, because why should they crawl down there with me? Why should I want them to?
Graphing NaEXPERIENCE
5000 JANKY EMAILS
pain hurts y'all. It’s painful, so embracing it just isolates you, even from other engineers, because there’s only so
much pain each of us can endure, we just can’t really go around willy-nilly embracing each others pain. It’s just not a
tenable scaling model.
But simplicity feels fantastic. Simplicity wants to be shared and celebrated. I should have always been working to
reduce complexity instead of just accepting it, but I never realized that my monitoring tools could help simplify
things.
I used to think this was about as simple as simple got. I used to make things like this when I understood something.
Well I still do I’ll draw a really complicated picture of the really complicated thing. I was so close, but I just never
took that next step. the one that was like lets
#3
Simplify that into something that isn’t painful to understand. This is line 3, and now you’re like dave that’s also not a
line, so not only are you 1 for three on following through with the click bait listicle title you sold us, line one was
super boring, and you sir, are a lying deceitful faud.
#3
so OK captain pedantic, here’s the line. I couldn’t have drawn it because I never had an amazing dashboard like this
beneath the lines I would draw
#3
this dashboard is actually a simplification of
This diagram. the curator of those metrics, took this diagram and made it obvious making one row
Row Per SLB
for every SLB in that architecture diagram, which, if you think about it is an interesting way to simplify your
understanding of service ingress because what does every service have in common? a load balancer.
Latency
Availability Traffic
so for each load balancer lets break down a few golden signals and these are like, if any blocking outage happens
inside any of these services, you’re going to see it in one of these signals, it’s guaranteed. If you don’t see it in one
of these signals, then it’s by definition not a blocking problem.
#3
And I want to stress that again, the person who curated this view was not the person who wrote the instrumentation
to get this data. Different people, different concerns, and the monitoring system is enabling them to work together
to reduce complexity, and aid comprehension.
I never could have drawn that line over this amazing dashboard, I never realized I could use monitoring tools to
build bridges to help other people understand the pain I was experiencing.
Everybody gets to measure things
Nobody OWNS monitoring
And then one day I hire into this shop where everybody can measure things, and nobody owns monitoring
and all these people are building all this stuff, and they’re taking measurements as they go
And then other people get a hold of those signals and refine them, and cynefin happens
And bam, suddenly first-level support understands API backpressure. And I’m trippin out like three weeks ago I was
trapped in a perpetual
Srsly tho; squirrels.
Bark or don’t bark?
tire-fire with this clueless watch-dog and NOBODY cared. Like nobody even KNEW. I was alone with my bij despite
being surrounded by other engineers
#4
I’m sorry it’s just a stark contrast. Like, check out line number four here. This measures the storage latency
introduced by our API matching metric names to UIDs. Point being, this is subtle latency metric. Like I can’t
describe it to you in less than 15 words. But check this out,
<redacted>
An Ops guy named Benjo, is working with it. That’s pretty crazy right? I mean where I come from, in the tire-fire it’s
atypical for ops people to work with latency data that describes job execution inside the database. I mean in the
tire fire this is what we referred to as somebody else’s problem. But OK Maybe Ben’s just a really savvy guy.
<redacted>
But, wait benjo the ops person is not only wise to this intricate db latency issue but he’s correlating it back to
system metrics. Ok huh, that seems extraordinarily astute to me, I mean even if you have the domain knowledge
<redacted>
Wait hold on a sec, he is actually talking to Data Engineering about this? Ok, savvy, astute and brave. Or maybe he
just doesn’t know that data engineers are mean
<redacted>
<redacted>
woah, what? they’re actually responding and working together with him? And evidently so is the Front-End Team?
Like What in the actual HELL is happening here? How does Ben the ops guy have all this Data Engineering domain
knowledge? And why isn’t anyone being mean to him? Where are the passive-aggressive insults? The hostility and
mistrust I’ve come to expect from engineers working in other teams? I mean this is dev and ops, this is
This is DOGS AND CATS. LIVING TOGETHER. I’m SHOCKED. It’s Shocking!
#4
I’ve certainly never been able to draw lines like that. I never knew enough about what was going on around me to
even work with data like this. I mean this is a Line that literally bridge disciplines.
#4
Look, if you squint, you can almost see the bridge that this line creates, between
#4
Data Engineering and Operations. And again I can’t help but wonder how this happened. Is it possible that
effective monitoring can bring about cultural change?
#4
Because it looks to me like that’s what’s happened here. It looks like the combination of an effective telemetry
infrastructure, combined with people who understand that measuring things is their job, has ultimately changed
how people interact with each other in this shop. Good monitoring changes people. That’s kind of mindblowing.
#5
So speaking of culture how much time do I have, ok good, because this is the good part, Line number 5. So this is
a funny story about Bryan one of our integrations engineers, and this happened a few months ago now, and for the
record I publicly apologize in advance to Brian I’m sorry dude, if you’re watching for shaming you like this on the
internet, but in my defense.. it was pretty funny though
so what happened was, Bryan was working on making our UI faster. And up top he rolls out a change, and that
change? Makes the UI faster. So mission accomplished, good jorb Bryan you done it. And to be clear he’s graphed
the performance data. I mean job done, homework done, he HAS a graph showing the stuff becoming faster.. but
what he’s pasted in here
is not that graph. It’s the mouse over of the tooltip. like he drew the graph, moused over it, took a screenshot of
the tooltip of the individual datapoints, for a single polling interval, and then pasted THAT in channel. And for
context, not only does Brian work for a startup whose singular purpose is the drawing linegraphs depicting time
series data, but his Boss at the time is literally
Line graphs FTW
the inventor of graphite AND this conference. Basically he works at line-graph-co for the godfather of linegraphs.
And he’s basically just walked up to the godfather of line graphs like “Behold my assortment of individual
datapoints!” but wait.. it gets better..
And then he says… ignore the zeros! omg so amazing.
I mean look.. if we ignore the zero’s literally we have two values. Like Bryan, sit down, I think it’s time we had the
data-to-ink ratio talk bro. Once upon a time there was a man named Tufte…
And I also want to point out the time stamps here, because it’s only a matter of seconds before his team begins
expressing their confusion, like wait.. what?
Is me messing with me RN?
Like I can almost see Dixon at home, head tilted to the side, unsure if this is some kind of elaborate troll, he’s like
maybe everybody got together and agreed to not paste any line graphs in channel for the whole week. Which
actually is kind of a brilliant troll and also something we’d totally do, but no, this was all bryan
Anyway, then Bryan facepalms… and throws line number 5 in the channel
And I never could have drawn this line, because I’ve never had a team around me that actually cared this much
about what I was working on. If I came to someone with some amazing data that I was super proud of they’d look at
it like
thatscoolIguessorwhatever
yeah um wow. Thats cool i guess or whatever. doctor mc-showoff. Anywayz pretty busy so please stop being in my
cubicle now.
but look at the love here, these people want to geek out on the data with you. They want to celebrate your win.
Not just in the fortune 500 goals and gift cards way, but by actually quantifying the y-axis of your success. They
want to comprehend your win so bad they are confused when they lack sufficient data to comprehend your win,
which I find astounding.
and at the risk of sounding campy I guess I just wanted to say I love mah teamies at Librato and I love all of you as
well, and I wanted to thank you for working to make effective monitoring happen in your shops, and building tools
to make it happen for other people. So, Sincerely, thank you, you make me want to come to work every day.
Questions?
@davejosephsen
And at this point the conference organizers have insisted that I allow you to ask questions but I have read the code
of conduct which has literally nothing to say on the subject of speakers shouting smoke bomb and running off stage
if they are confronted with a hostile question. SO that is a right I reserve.

More Related Content

Similar to Five Lines I Could Not Draw

10 IMPOSSIBLE THINGS TO DO BEFORE BREAKFAST.DOC
10 IMPOSSIBLE THINGS TO DO BEFORE BREAKFAST.DOC10 IMPOSSIBLE THINGS TO DO BEFORE BREAKFAST.DOC
10 IMPOSSIBLE THINGS TO DO BEFORE BREAKFAST.DOCclive rosen
 
How To Be A Real Developer In Two Easy Steps
How To Be A Real Developer In Two Easy StepsHow To Be A Real Developer In Two Easy Steps
How To Be A Real Developer In Two Easy Stepsnorthofnormal
 
I Just Got Fired
I Just Got FiredI Just Got Fired
I Just Got Firedjasonalba
 
StephenSislerTranscript.docx
StephenSislerTranscript.docxStephenSislerTranscript.docx
StephenSislerTranscript.docxAri Meisel
 
StevenSislerTranscript.docx
StevenSislerTranscript.docxStevenSislerTranscript.docx
StevenSislerTranscript.docxAri Meisel
 
Os Keyshacks
Os KeyshacksOs Keyshacks
Os Keyshacksoscon2007
 
Programming methodology lecture25
Programming methodology lecture25Programming methodology lecture25
Programming methodology lecture25NYversity
 
RubyConf 2022 - From beginner to expert, and back again
RubyConf 2022 - From beginner to expert, and back againRubyConf 2022 - From beginner to expert, and back again
RubyConf 2022 - From beginner to expert, and back againmtoppa
 
Math is Super Cool Script
Math is Super Cool ScriptMath is Super Cool Script
Math is Super Cool Scriptthewhitedove52
 
Programming methodology lecture06
Programming methodology lecture06Programming methodology lecture06
Programming methodology lecture06NYversity
 
Facilitation Guidelines by Unger
Facilitation Guidelines by UngerFacilitation Guidelines by Unger
Facilitation Guidelines by UngerBusiness901
 
People Hacks
People HacksPeople Hacks
People HacksAdam Keys
 
All I Ever Need To Know About Testing I Learned In Kindergarten
All I Ever Need To Know About Testing I Learned In KindergartenAll I Ever Need To Know About Testing I Learned In Kindergarten
All I Ever Need To Know About Testing I Learned In KindergartenEduardo Sinkiko Yonamine Costa
 
Annoying office manners
Annoying office mannersAnnoying office manners
Annoying office mannersRafath Razia
 
5 vital PROCESSES & TOOLS for our STARTUP
5 vital PROCESSES & TOOLS for our STARTUP5 vital PROCESSES & TOOLS for our STARTUP
5 vital PROCESSES & TOOLS for our STARTUPFloown
 
A talk about talking!
A talk about talking!A talk about talking!
A talk about talking!Tanya Reilly
 
Programming methodology lecture10
Programming methodology lecture10Programming methodology lecture10
Programming methodology lecture10NYversity
 

Similar to Five Lines I Could Not Draw (20)

10 IMPOSSIBLE THINGS TO DO BEFORE BREAKFAST.DOC
10 IMPOSSIBLE THINGS TO DO BEFORE BREAKFAST.DOC10 IMPOSSIBLE THINGS TO DO BEFORE BREAKFAST.DOC
10 IMPOSSIBLE THINGS TO DO BEFORE BREAKFAST.DOC
 
How To Be A Real Developer In Two Easy Steps
How To Be A Real Developer In Two Easy StepsHow To Be A Real Developer In Two Easy Steps
How To Be A Real Developer In Two Easy Steps
 
I Just Got Fired
I Just Got FiredI Just Got Fired
I Just Got Fired
 
StephenSislerTranscript.docx
StephenSislerTranscript.docxStephenSislerTranscript.docx
StephenSislerTranscript.docx
 
StevenSislerTranscript.docx
StevenSislerTranscript.docxStevenSislerTranscript.docx
StevenSislerTranscript.docx
 
Living a life of scripts
Living a life of scriptsLiving a life of scripts
Living a life of scripts
 
Os Keyshacks
Os KeyshacksOs Keyshacks
Os Keyshacks
 
Programming methodology lecture25
Programming methodology lecture25Programming methodology lecture25
Programming methodology lecture25
 
RubyConf 2022 - From beginner to expert, and back again
RubyConf 2022 - From beginner to expert, and back againRubyConf 2022 - From beginner to expert, and back again
RubyConf 2022 - From beginner to expert, and back again
 
Habits of genius
Habits of geniusHabits of genius
Habits of genius
 
Math is Super Cool Script
Math is Super Cool ScriptMath is Super Cool Script
Math is Super Cool Script
 
Programming methodology lecture06
Programming methodology lecture06Programming methodology lecture06
Programming methodology lecture06
 
Facilitation Guidelines by Unger
Facilitation Guidelines by UngerFacilitation Guidelines by Unger
Facilitation Guidelines by Unger
 
People Hacks
People HacksPeople Hacks
People Hacks
 
All I Ever Need To Know About Testing I Learned In Kindergarten
All I Ever Need To Know About Testing I Learned In KindergartenAll I Ever Need To Know About Testing I Learned In Kindergarten
All I Ever Need To Know About Testing I Learned In Kindergarten
 
Problem solving
Problem solvingProblem solving
Problem solving
 
Annoying office manners
Annoying office mannersAnnoying office manners
Annoying office manners
 
5 vital PROCESSES & TOOLS for our STARTUP
5 vital PROCESSES & TOOLS for our STARTUP5 vital PROCESSES & TOOLS for our STARTUP
5 vital PROCESSES & TOOLS for our STARTUP
 
A talk about talking!
A talk about talking!A talk about talking!
A talk about talking!
 
Programming methodology lecture10
Programming methodology lecture10Programming methodology lecture10
Programming methodology lecture10
 

Recently uploaded

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 

Recently uploaded (20)

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 

Five Lines I Could Not Draw

  • 1. 5 lines I couldn’t draw Hi everybody
  • 2. I’m dave dave@librato.com @davejosephsen github: djosephsen I’m Dave and I work on the Ops team at Librato. In fact you’ve caught me in a bit of a transitional period because I’ve recently decided to move back to ops after spending two years as Librato’s developer evangelist, which has been a fascinating role that’s given me the opportunity to really branch out learn many new things.
  • 3. Thought Leader Power Moves For example Here’s a shot of me doing an all male panel, which is just one of the many thought-leader power moves I’ve perfected over the last couple years in my former role as developer evangelist.
  • 4. Thought Leader Power Moves I can also do that faux-earnest, touching my fingers together while wearing a blazer and pontificating at you thing. That’s totally in my repertoire so if you’re ever in doubt of the validity of my argument I can touch my fingers together and become super reasonable looking
  • 5. • “Resource”: •Human-being who works here ! • “Lead”: •Human-being who doesn’t work here ! • “Content Marketing: •Annoying people on twitter ! • “Engagement” •Tricking people into talking to you I’ve penetrated the marketing team’s vernacular, so yeah.. that took two years, but I can definitely sit in those meetings now and.. I mean I pretty much know mostly what’s going on I think.
  • 6. Thought Leader Power Moves git push -f origin master! I can force-push master from a vendor booth. That was a huge achievement get in this role. By the way I’ve heard if you force push master from an all male panel they have to make you CTO. Pro tip. But yeah the vender booth is like a second home to me now. I’ve done quite a bit of venderboothing over the last few years
  • 7. Graphing Na Venderboothing Here we are at twilio signal a few months ago. And something happened at signal I want to tell you about, because it happens a lot in the course of my venderboothing endeavors where
  • 8. Venderboothing I’ll be there in the booth, you know, my fingernails lightly resting against one another. And I’ll be engaging with leads, and filling our funnel top with branding, and this neckbeardy dude
  • 9. will kind of slide up, and lurk. He’ll just stand there glaring at me as I do my booth dance, impossible to ignore, like He’s like, heavy. I mean not literally heavy I’m not body-shaming him, I just mean he’s like… laden with discontent you know?
  • 10. Graphing Na BLARG ANOMOLY DETECTION!!! and after listening for a while he’ll just spontaneously interrupt by blurting out something like “WHAT ABOUT ANOMOLY DETECTION!”. And by the way in every case this has been what my wife calls a “catestrophic digression”,
  • 11. Graphing Na like this interruption will be so awkward and so rude that it will send people scurrying away from the booth out of either embarrassment or genuine fear for their own safety. And I’ll want to run away too but not running away from incredulous neck beards is another skill I’ve developed so instead
  • 12. Graphing Na I’ll don my thought leader cap and be like “what an informed and thought provoking question, Let me ask you, what kind of monitoring tools are you using today?”
  • 13. and inevitably, he’ll launch into this apocalyptic tale of woe, wherein he will describe the faustian hell in which he is currently trapped. Like his companies product
  • 14. Perl! is, I’m not making this up, a bunch of perl scripts that
  • 15. OTHER PEOPLES Perl! some consultant wrote back in 2006, and for monitoring they have these
  • 16. Graphing Na MORE Perl! other perl scripts that a different consultant wrote in 2008 and those perl scripts are watching those first perl scripts
  • 17. Graphing Na WINDOWS!! And by the way they all run on Windows so it’s not just perl
  • 18. Graphing Na ACTIVESTATE PERL! but active state perl, and these active state perl scripts
  • 19. EMAILZ! send emails when unexpected things happen or error states are detected
  • 20. Graphing Na MAPI EMAILZ!!! and they’re using… MAPI on windows XP to do this,
  • 21. Graphing NaTHOUSANDS OF MAPI EMAILZ!!! And of course they’re sending 5000 emails per day because unexpected state is basically always happening and of course nobody is paying any attention to the emails.
  • 22. Graphing NaTHOUSANDS OF MAPI EMAILZ!!! and usually by this point he feels awful and I feel awful so I’ll say something “like wow, that sounds awful it must be a really frustrating and stressful environment for you, like I’m sorry to hear it. And he’ll be all”
  • 23. Graphing Na it’s fine…. and I’ll be like. Man. You sure? because that sounds really bad, like I sincerely want to just hug you right now. Would you like a hug?
  • 24. Graphing Na And he’ll be like no really it’s fine, I don’t need a AAAHHH oh my god hug me. And we’ll hug it out and he’ll cry a little bit, and then I’ll be like. You know what you should do? You should vi that .qmail-root file mister neckbeard. You know?
  • 25. Graphing Na | cat > /dev/null && echo ‘emailz,1,g’ | /usr/bin/statsd Throw something like this in there. Just dev null those messages and count them instead. I’m not saying that’s your permanent solution or anything
  • 26. Graphing Na but i mean you don’t even know what that signal LOOKS like. You might be able to alert on simple message volume, or the derivative, or maybe just seeing that line will TEACH you something about the behavior or rhythm of your system.
  • 27. Graphing Na Ya think? And he’ll be like? “fa foa foa You think?”
  • 28. Graphing Na DO EEET! And I’ll be like heck yeah mister neckbeard. Go giterdone, and he’ll leave happy and with a lighter heart. It’s a funny story. But it’s also a true story. I’ve had this conversation at least 20 times. And today surrounded by all of you and my librato teamies it’s easy to forget, but if I’m being honest?
  • 29. This was me one job ago I was this neckbeard. This was me. Not a long time ago either but like one job ago, I was him. And I more or less knew everything about monitoring that I know now, and yet I couldn’t .
  • 30. Graphing Na #1 draw this line, any more than he could. And I can hear you like Really dave emails per second? I mean yes, of course I could draw a line of emails per second, but then so can he. What neither of us could do is make that cognitive leap of applying monitoring tools as a means of understanding system behavior independent of alerting. So this is line number one I couldn’t draw.
  • 31. Graphing NaSpoiler alert: There will be 4 more And I come across these pretty commonly these days where I’ll see some data that we’re using internally and be like man.. I never could have drawn one job ago and I just kind of thought it’d be interesting to explore the reasons why. Live on stage in front of a room full of visionaries.
  • 32. I was carrying a misapprehension about what monitoring was and whom it was for Because in every case the problem wasn’t a lack of technical aptitude, it was wrapped up in my beliefs and expectations. In this particular case, the problem . The thing that’s causing our cognitive dissonance is that these perl scripts are sending alerts already. So to us— to mr neck beard and I, it seemed like monitoring was already in place. Box checked. Monitoring done.
  • 33. Because to us monitoring the thing that made alerts happen. We had no means of describing an undertaking called monitoring that was meaningfully discreet from alerting. That’s why monitoring is an ops thing. It’s about uptime. So obviously ops owns it. So as an ops person the pattern was that we would install or inherit this thing
  • 34. this, guard-dog-like entity. I would tell it where to sit, and it would bark whenever it felt like something wasn’t right.
  • 35. And it was great at barking. It’d bark all day and night bark bark bark bark bark. And that sucked. so I thought, maybe I should train it. So I’d put in a ticket to train it
  • 36. Squirrel! ZOMG SQUIRREL!!! It’s RIGHT THERE! YOUGUISE? SQUIRRELL! but I’d never work the ticket because it was literally nobody’s priority but mine. I’d feel guilty if I worked on it because it seemd self indulgent or I’d feel guilty if I didn’t because dogs sitting over there barking at squirrels all night and it’s just embarrassing
  • 37. But what about squirrels? and maybe eventually I’d get some time to take it aside and be like listen. Thread contention? Bark. Ice cream trucks? NO BARK. But no matter what I told it, was never something that could help me out with stuff like these perl emails because
  • 38. Oh sure. Blame the puppies Dave ultimately our relationship was Prescriptive in the wrong direction. The dog was always telling me what I was interested in, so the best we could ever hope to achieve was this on going negotiation about what to bark at how much barking was enough barking, it always became about the barking
  • 39. Monitoring is not FOR alerting Here are two important things I no longer believe. These are the things that I think makes me different from mister neckbeard. First, I don’t believe monitoring is for alerting. It’s not about uptime.
  • 40. Nobody OWNS monitoring Next, monitoring is neither my responsibility nor is it my burden. It’s not mine. not my pet. It’s a tape measure. It’s a tape measure that I get to share with every engineer I work with.
  • 41. Ops owns Monitoring Everyone owns Monitoring I think if you’ve been listening to the real speakers, basically all the ones who aren’t me, you’ll find that this is the probably most important underlying belief that differentiates them from mister neckbeard. People who run effective monitoring infrastructure
  • 42. believe we all get to ask questions. We all get to measure things. Not just ops, not just dev, not just DBA, everybody who cares, gets to measure, and we all get to use the same tape measure, and it’s perfectly reasonable to expect to get accurate, timely answers to our questions, and that’s what MONITORING is for.
  • 43. Monitoring is FOR asking questions It’s the infrastructure that makes it possible for everyone to understand system behavior. That’s what monitoring is for me now, today. That’s how it works. Not because I installed or bought some particular collection of tools or learned about percentiles, but exactly because my expectations have changed. Does that make sense?
  • 44. You might be fascinated with anomaly detection because your input signal sucks? What If I Told You: And mark was right when he said tools matter, but the tools are there today and yet many still suffer. Like you can build the metrics infrastructure you need right now, that’s hard, and expensive but possible. Or you can buy it, and that’s easier and expensive but possible. But to make either of those actually work, you still need to change the people. That’s a lot harder. No combination of bleeding edge tools, no amount of fancy anomaly detection is going to save mister neck beard until he understands that monitoring is not for alerting, and that measuring things is everyone’s job.
  • 45. Complexity Isolates And Mr Neckbeard has another problem too. He’s embracing complexity. See? To him, those emails are his burden to bear, his lot in life. They are the hand grenade upon which he will jump to save us all. And that belief isolates him. It mires him in complexity, and he believes that’s fine. Let me show you what I mean..
  • 46. #2 Here’s line number two that I couldn’t draw. And I can hear you saying Dave, that isn’t even a line. Like, you literally had one job dude and the first line was disappointing and this isn’t even a line. But the line you aren’t seeing here is actually a REALLY important … lack of a line.
  • 47. because what you aren’t seeing here is the number of people currently using the Librato API who are being throttled. So slight digression for context this is a pretty common problem when new users are wiring us up for the first time and what things do they send?
  • 48. All the things! Cheslock knows all about that you can ask him.. So rather than surprise people with a million dollar bill, We catch unlikely new ingest throttling and we’ll shoot an email or whatever that’s like hey um, maybe dial that back unless you actually want to pay us the GDP of uraguy every month.
  • 49. and a whole bunch of metrics like this are physically mounted to the wall next to our support team, because they’re the ones who are going to be there to help the user understand why they’re suddenly getting http500’s but the interesting part is that these weren’t created for support. These metrics in fact, were originally put in place by the engineer who implemented throttling to understand what that signal looked like
  • 50. And this made me wonder like, how did this happen. How did first level support begin repurposing API metrics?
  • 51. With whom shall I share my bounty of hard-won metric data? Was there some cuddly API engineer who, in a spontaneous bout of altruism went to go
  • 52. devops unicorn cuddle with the support team and make rainbow metrics babies of team spirit? That’s amazing, I want to meet these api engineers, so I went to go talk to them and
  • 53. they seemed like typical software engineers who exhibit the typical demeanor and mannerisms that one expects software engineers to manifest. So yeah long story short I think what’s happening here
  • 54. Cynefin is something like emergent cynefin.
  • 55. and if you’re not familiar, cynefin is a framework that helps us make sense of complexity
  • 56. The idea is that you categorize the complexity you’re dealing with, and then you attempt to move from whatever category you’re in, to the next less complex category until you hit
  • 57. obvious right here, which you can see is sort of an uphill climb
  • 58. Things you need to move: • Control •Understanding •Standardization but my time at Librato has convinced me that cynefin can be an emergent property of effective monitoring systems. By which I mean effective monitoring just sort of organically provides you a lot of the stuff you need to move toward obvious
  • 59. And I can hear you like really dave? emergent cynefin properties? You should have stuck with unicorn babies of team spirit. Like do you even know what you sound like when you say shit like that? which, yes, I hear myself
  • 60. Cynefin (no, for reals tho) I mean the processes line up pretty well. What you need to climb the cynefin ladder is pretty much what you give a decent telemetry system to people who understand that their job is to measure things.
  • 61. And this is a perfect example. Our support team was able to move from a very opaque and chaotic form of complexity straight to obvious by repurposing monitoring data from another team, and today my friend Nik on the support team can look up at that very real, very physical wall and say
  • 62. “Behold. Throttled users!”, and go talk to them about it. That’s a first-level support team that organically understands the concept of http return codes, and services oriented architecture, and API backpressure. Nobody wrote them a manual for that. That’s Cynefin at work.
  • 63. I could never draw lines like this… not-line. And the reason I couldn’t draw this not-line was because like mister neck beard I thought that embracing complex things 5000 janky perl emails was my job. I thought complexity was my lot in life,
  • 64. Me dissecting somebody’s javascript circa 2003 I was totally the guy who would sit down with that incomprehensible bowl of spagetti code that some mean- spirited consultant wrote in 1997 and I’d be like I hate not understanding this. “I’m not leaving until this is understood. I’m not going to a meeting, I’m not going to lunch, I’m not going home”. And managers would come looking for me like did dave show up today. and my teamies would have to be like
  • 65. don’t bug him he’s dissecting some janky code. This is a real picture, this is my boss steve at IBM global services bringing me a sandwich. So For the millenials in the audience, this is actually what team spirit looked like in corporate america in the late 90’s. But then once I had it figured it out I’d become the owner of that janky perl forever. The only person who ever understood it and then people would be like
  • 66. yo dave, that janky perl thing is broke again. Right? They’d dump it on me when it broke and that’s perfectly rational, because why should they crawl down there with me? Why should I want them to?
  • 67. Graphing NaEXPERIENCE 5000 JANKY EMAILS pain hurts y'all. It’s painful, so embracing it just isolates you, even from other engineers, because there’s only so much pain each of us can endure, we just can’t really go around willy-nilly embracing each others pain. It’s just not a tenable scaling model.
  • 68. But simplicity feels fantastic. Simplicity wants to be shared and celebrated. I should have always been working to reduce complexity instead of just accepting it, but I never realized that my monitoring tools could help simplify things.
  • 69. I used to think this was about as simple as simple got. I used to make things like this when I understood something. Well I still do I’ll draw a really complicated picture of the really complicated thing. I was so close, but I just never took that next step. the one that was like lets
  • 70. #3 Simplify that into something that isn’t painful to understand. This is line 3, and now you’re like dave that’s also not a line, so not only are you 1 for three on following through with the click bait listicle title you sold us, line one was super boring, and you sir, are a lying deceitful faud.
  • 71. #3 so OK captain pedantic, here’s the line. I couldn’t have drawn it because I never had an amazing dashboard like this beneath the lines I would draw
  • 72. #3 this dashboard is actually a simplification of
  • 73. This diagram. the curator of those metrics, took this diagram and made it obvious making one row
  • 74. Row Per SLB for every SLB in that architecture diagram, which, if you think about it is an interesting way to simplify your understanding of service ingress because what does every service have in common? a load balancer.
  • 75. Latency Availability Traffic so for each load balancer lets break down a few golden signals and these are like, if any blocking outage happens inside any of these services, you’re going to see it in one of these signals, it’s guaranteed. If you don’t see it in one of these signals, then it’s by definition not a blocking problem.
  • 76. #3 And I want to stress that again, the person who curated this view was not the person who wrote the instrumentation to get this data. Different people, different concerns, and the monitoring system is enabling them to work together to reduce complexity, and aid comprehension.
  • 77. I never could have drawn that line over this amazing dashboard, I never realized I could use monitoring tools to build bridges to help other people understand the pain I was experiencing.
  • 78. Everybody gets to measure things Nobody OWNS monitoring And then one day I hire into this shop where everybody can measure things, and nobody owns monitoring
  • 79. and all these people are building all this stuff, and they’re taking measurements as they go
  • 80. And then other people get a hold of those signals and refine them, and cynefin happens
  • 81. And bam, suddenly first-level support understands API backpressure. And I’m trippin out like three weeks ago I was trapped in a perpetual
  • 82. Srsly tho; squirrels. Bark or don’t bark? tire-fire with this clueless watch-dog and NOBODY cared. Like nobody even KNEW. I was alone with my bij despite being surrounded by other engineers
  • 83. #4 I’m sorry it’s just a stark contrast. Like, check out line number four here. This measures the storage latency introduced by our API matching metric names to UIDs. Point being, this is subtle latency metric. Like I can’t describe it to you in less than 15 words. But check this out,
  • 84. <redacted> An Ops guy named Benjo, is working with it. That’s pretty crazy right? I mean where I come from, in the tire-fire it’s atypical for ops people to work with latency data that describes job execution inside the database. I mean in the tire fire this is what we referred to as somebody else’s problem. But OK Maybe Ben’s just a really savvy guy.
  • 85. <redacted> But, wait benjo the ops person is not only wise to this intricate db latency issue but he’s correlating it back to system metrics. Ok huh, that seems extraordinarily astute to me, I mean even if you have the domain knowledge
  • 86. <redacted> Wait hold on a sec, he is actually talking to Data Engineering about this? Ok, savvy, astute and brave. Or maybe he just doesn’t know that data engineers are mean
  • 87. <redacted> <redacted> woah, what? they’re actually responding and working together with him? And evidently so is the Front-End Team? Like What in the actual HELL is happening here? How does Ben the ops guy have all this Data Engineering domain knowledge? And why isn’t anyone being mean to him? Where are the passive-aggressive insults? The hostility and mistrust I’ve come to expect from engineers working in other teams? I mean this is dev and ops, this is
  • 88. This is DOGS AND CATS. LIVING TOGETHER. I’m SHOCKED. It’s Shocking!
  • 89. #4 I’ve certainly never been able to draw lines like that. I never knew enough about what was going on around me to even work with data like this. I mean this is a Line that literally bridge disciplines.
  • 90. #4 Look, if you squint, you can almost see the bridge that this line creates, between
  • 91. #4 Data Engineering and Operations. And again I can’t help but wonder how this happened. Is it possible that effective monitoring can bring about cultural change?
  • 92. #4 Because it looks to me like that’s what’s happened here. It looks like the combination of an effective telemetry infrastructure, combined with people who understand that measuring things is their job, has ultimately changed how people interact with each other in this shop. Good monitoring changes people. That’s kind of mindblowing.
  • 93. #5 So speaking of culture how much time do I have, ok good, because this is the good part, Line number 5. So this is a funny story about Bryan one of our integrations engineers, and this happened a few months ago now, and for the record I publicly apologize in advance to Brian I’m sorry dude, if you’re watching for shaming you like this on the internet, but in my defense.. it was pretty funny though
  • 94. so what happened was, Bryan was working on making our UI faster. And up top he rolls out a change, and that change? Makes the UI faster. So mission accomplished, good jorb Bryan you done it. And to be clear he’s graphed the performance data. I mean job done, homework done, he HAS a graph showing the stuff becoming faster.. but what he’s pasted in here
  • 95. is not that graph. It’s the mouse over of the tooltip. like he drew the graph, moused over it, took a screenshot of the tooltip of the individual datapoints, for a single polling interval, and then pasted THAT in channel. And for context, not only does Brian work for a startup whose singular purpose is the drawing linegraphs depicting time series data, but his Boss at the time is literally
  • 96. Line graphs FTW the inventor of graphite AND this conference. Basically he works at line-graph-co for the godfather of linegraphs. And he’s basically just walked up to the godfather of line graphs like “Behold my assortment of individual datapoints!” but wait.. it gets better..
  • 97. And then he says… ignore the zeros! omg so amazing.
  • 98. I mean look.. if we ignore the zero’s literally we have two values. Like Bryan, sit down, I think it’s time we had the data-to-ink ratio talk bro. Once upon a time there was a man named Tufte…
  • 99. And I also want to point out the time stamps here, because it’s only a matter of seconds before his team begins expressing their confusion, like wait.. what?
  • 100. Is me messing with me RN? Like I can almost see Dixon at home, head tilted to the side, unsure if this is some kind of elaborate troll, he’s like maybe everybody got together and agreed to not paste any line graphs in channel for the whole week. Which actually is kind of a brilliant troll and also something we’d totally do, but no, this was all bryan
  • 101. Anyway, then Bryan facepalms… and throws line number 5 in the channel
  • 102. And I never could have drawn this line, because I’ve never had a team around me that actually cared this much about what I was working on. If I came to someone with some amazing data that I was super proud of they’d look at it like
  • 103. thatscoolIguessorwhatever yeah um wow. Thats cool i guess or whatever. doctor mc-showoff. Anywayz pretty busy so please stop being in my cubicle now.
  • 104. but look at the love here, these people want to geek out on the data with you. They want to celebrate your win. Not just in the fortune 500 goals and gift cards way, but by actually quantifying the y-axis of your success. They want to comprehend your win so bad they are confused when they lack sufficient data to comprehend your win, which I find astounding.
  • 105. and at the risk of sounding campy I guess I just wanted to say I love mah teamies at Librato and I love all of you as well, and I wanted to thank you for working to make effective monitoring happen in your shops, and building tools to make it happen for other people. So, Sincerely, thank you, you make me want to come to work every day.
  • 106. Questions? @davejosephsen And at this point the conference organizers have insisted that I allow you to ask questions but I have read the code of conduct which has literally nothing to say on the subject of speakers shouting smoke bomb and running off stage if they are confronted with a hostile question. SO that is a right I reserve.