SlideShare a Scribd company logo
Monitoring:
Doing it the right way
Saving your sanity, making the world a better place.
John-Daniel Trask
Raygun
Who is this person?
Funny voice.
Weird name.
What’s his deal?
What we’re covering
Doing monitoring the right way.
Getting started, but also helping identify potential issues
With your current monitoring.
Getting
started with monitoring
Why monitor things?
• You’re not employed to write code.
Business value?
• I got a CS degree mate, not an MBA
Framework
1. Do a best-efforts analysis of what to monitor
• Bad things
• Good things
• Limit to a sprint or two of effort, you won’t get it perfect.
2. Perform post mortems to identify gaps in your monitoring
3. Update/improve monitoring based on findings
4. GOTO 2
Getting started
1. Something is better than nothing.
2. You can go a long way with some simple tools
Metrics & Monitoring
• Metrics are a given value or measure.
• Monitoring encapsulates everything.
Metric: error rate over time
Full monitoring: full story about an error
Monitoring vs. Observability
• Is there a difference?
User
Server
Application
Know what to measure
• You could track almost anything
Crash reporting JavaScript log aggregation
Metrics server (statsd) Alerting and pager tools
Dashboarding tools Usage monitoring
Real User Monitoring Structured and unstructured logging
Up time monitoring Network monitoring
Application performance monitoring Wire-level monitoring
Server monitoring Canary logging
Log aggregation service Distributed tracing
Intrusion detection monitoring Employee device monitoring
Cloud metrics from cloud provider Security monitoring
Custom event tracking Advanced visualizing tooling
Deployment tracking Infrastructure change monitoring
User navigation and click tracking monitoring Infrastructure spend monitoring
The obvious
• Errors & error rate
• Server performance
• Requests per second per service
• Database call times
What about the less obvious?
• Back to basics: business value users!
Amazon example
• When is the page loaded?
What about the less obvious?
• Cost to serve each customer
• Feature use tracking to double down on what customers do the most
• Good things
• Any you’d add?
Getting
the most from
monitoring
Connect the dots
• Connect all your data together
• Connect teams
Information Radiators
• A fancy way of saying TV
Averages are lies
• Yet so many monitoring tools focus on them
On Average, everyone here is worth $900m.
Quantiles
• Median
• P90
• P99
• P99.9
P25 P75
Why are quantiles hard?
• You need to store everything
Common
monitoring
mistakes
Common mistakes
• Only measuring your servers
Common mistakes
• Only measuring the server
Common mistakes
• Saving money by flying blind
Common mistakes
• Bad sampling of data
Common mistakes
• Building it yourself
Common mistakes
• Making it difficult to add to new systems
Common mistakes
• Making it difficult to consume the data
Common mistakes
• Just buying/installing a tool doesn’t help
Common mistakes
• Not getting out of the building
Common mistakes
• NEW: Compliance!
Common mistakes
• Anyone have a mistake they’d love to share?
References & Links
• Observability vs. Monitoring: https://medium.com/@copyconstruct/monitoring-and-observability-8417d1952e1c
• Coda Hale, Metrics, Metrics Everywhere: https://www.youtube.com/watch?v=czes-oa0yik
• Google Site Reliability Book: https://landing.google.com/sre/
• Developers are your GDPR risk: https://jdtrask.com/post/software-developers-are-your-biggest-gdpr-risk.html
• Netflix tech blog: https://medium.com/netflix-techblog/
Questions?
Thank you for coming!
@traskjd
@raygunio
Raygun.com (I also have some swag)

More Related Content

What's hot

The Fault In Our Code
The Fault In Our CodeThe Fault In Our Code
The Fault In Our Code
Camilo Payan
 
BA World Boston: Evening the Odds with Monte Carlo Project Forecasting
BA World Boston: Evening the Odds with Monte Carlo Project ForecastingBA World Boston: Evening the Odds with Monte Carlo Project Forecasting
BA World Boston: Evening the Odds with Monte Carlo Project Forecasting
Wm. Hunter Tammaro
 
SkyStem Webinar-Close Like a Rock Star
SkyStem Webinar-Close Like a Rock StarSkyStem Webinar-Close Like a Rock Star
SkyStem Webinar-Close Like a Rock Star
Annette Grotz
 
Machine Learning Vital Signs
Machine Learning Vital SignsMachine Learning Vital Signs
Machine Learning Vital Signs
Donald Miner
 
Literacy Iq Test 1[1]
Literacy Iq Test 1[1]Literacy Iq Test 1[1]
Literacy Iq Test 1[1]
The Literacy Center
 
Back-upNightmares8
Back-upNightmares8Back-upNightmares8
Back-upNightmares8Steve Tester
 
How to Pass CCIE in first Attempt? Tips by CCIE Experts
How to Pass CCIE in first Attempt? Tips by CCIE ExpertsHow to Pass CCIE in first Attempt? Tips by CCIE Experts
How to Pass CCIE in first Attempt? Tips by CCIE Experts
I-Medita Leanring Solutions
 
Performance testing mistakes newbies make
Performance testing mistakes newbies makePerformance testing mistakes newbies make
Performance testing mistakes newbies make
Confiz Limited
 
Conversion Rate Optimization 101 - Kick-Start Your Growth Engine
Conversion Rate Optimization 101 - Kick-Start Your Growth EngineConversion Rate Optimization 101 - Kick-Start Your Growth Engine
Conversion Rate Optimization 101 - Kick-Start Your Growth Engine
Kissmetrics on SlideShare
 
Mw ppt
Mw pptMw ppt
Mw ppt
itshield
 
The anatomy of an A/B Test - JSConf Colombia Workshop
The anatomy of an A/B Test - JSConf Colombia WorkshopThe anatomy of an A/B Test - JSConf Colombia Workshop
The anatomy of an A/B Test - JSConf Colombia Workshop
Eduardo Shiota Yasuda
 
Managed-Workstations-Presentation-EN
Managed-Workstations-Presentation-ENManaged-Workstations-Presentation-EN
Managed-Workstations-Presentation-ENMatt Cornelius
 
The art of Bugging
The art of BuggingThe art of Bugging
The art of Bugging
Prajna Paramita Biswas
 
Probing Questions
Probing QuestionsProbing Questions
Probing Questions
StephanGattuso
 
Why OBVA Virtual Assistant for your ebay/amazon store and small business out...
Why OBVA Virtual Assistant for your ebay/amazon store  and small business out...Why OBVA Virtual Assistant for your ebay/amazon store  and small business out...
Why OBVA Virtual Assistant for your ebay/amazon store and small business out...Office, Internet
 
You have no idea what your users want - WordCamp PDX
You have no idea what your users want - WordCamp PDXYou have no idea what your users want - WordCamp PDX
You have no idea what your users want - WordCamp PDX
Evan Solomon
 
HUSTEF '21 Keynote: Hands Off Exploratory Testing - Managing at Scale
HUSTEF '21 Keynote: Hands Off Exploratory Testing - Managing at ScaleHUSTEF '21 Keynote: Hands Off Exploratory Testing - Managing at Scale
HUSTEF '21 Keynote: Hands Off Exploratory Testing - Managing at Scale
Maaret Pyhäjärvi
 
Lean Responsive
Lean ResponsiveLean Responsive
Lean Responsive
Josh Jeffryes
 

What's hot (20)

The Fault In Our Code
The Fault In Our CodeThe Fault In Our Code
The Fault In Our Code
 
Dave everett
Dave everettDave everett
Dave everett
 
BA World Boston: Evening the Odds with Monte Carlo Project Forecasting
BA World Boston: Evening the Odds with Monte Carlo Project ForecastingBA World Boston: Evening the Odds with Monte Carlo Project Forecasting
BA World Boston: Evening the Odds with Monte Carlo Project Forecasting
 
SkyStem Webinar-Close Like a Rock Star
SkyStem Webinar-Close Like a Rock StarSkyStem Webinar-Close Like a Rock Star
SkyStem Webinar-Close Like a Rock Star
 
Machine Learning Vital Signs
Machine Learning Vital SignsMachine Learning Vital Signs
Machine Learning Vital Signs
 
Literacy Iq Test 1[1]
Literacy Iq Test 1[1]Literacy Iq Test 1[1]
Literacy Iq Test 1[1]
 
Back-upNightmares8
Back-upNightmares8Back-upNightmares8
Back-upNightmares8
 
Ticket101
Ticket101Ticket101
Ticket101
 
How to Pass CCIE in first Attempt? Tips by CCIE Experts
How to Pass CCIE in first Attempt? Tips by CCIE ExpertsHow to Pass CCIE in first Attempt? Tips by CCIE Experts
How to Pass CCIE in first Attempt? Tips by CCIE Experts
 
Performance testing mistakes newbies make
Performance testing mistakes newbies makePerformance testing mistakes newbies make
Performance testing mistakes newbies make
 
Conversion Rate Optimization 101 - Kick-Start Your Growth Engine
Conversion Rate Optimization 101 - Kick-Start Your Growth EngineConversion Rate Optimization 101 - Kick-Start Your Growth Engine
Conversion Rate Optimization 101 - Kick-Start Your Growth Engine
 
Mw ppt
Mw pptMw ppt
Mw ppt
 
The anatomy of an A/B Test - JSConf Colombia Workshop
The anatomy of an A/B Test - JSConf Colombia WorkshopThe anatomy of an A/B Test - JSConf Colombia Workshop
The anatomy of an A/B Test - JSConf Colombia Workshop
 
Managed-Workstations-Presentation-EN
Managed-Workstations-Presentation-ENManaged-Workstations-Presentation-EN
Managed-Workstations-Presentation-EN
 
The art of Bugging
The art of BuggingThe art of Bugging
The art of Bugging
 
Probing Questions
Probing QuestionsProbing Questions
Probing Questions
 
Why OBVA Virtual Assistant for your ebay/amazon store and small business out...
Why OBVA Virtual Assistant for your ebay/amazon store  and small business out...Why OBVA Virtual Assistant for your ebay/amazon store  and small business out...
Why OBVA Virtual Assistant for your ebay/amazon store and small business out...
 
You have no idea what your users want - WordCamp PDX
You have no idea what your users want - WordCamp PDXYou have no idea what your users want - WordCamp PDX
You have no idea what your users want - WordCamp PDX
 
HUSTEF '21 Keynote: Hands Off Exploratory Testing - Managing at Scale
HUSTEF '21 Keynote: Hands Off Exploratory Testing - Managing at ScaleHUSTEF '21 Keynote: Hands Off Exploratory Testing - Managing at Scale
HUSTEF '21 Keynote: Hands Off Exploratory Testing - Managing at Scale
 
Lean Responsive
Lean ResponsiveLean Responsive
Lean Responsive
 

Similar to Doing monitoring right

Webinar: Keep Calm and Scale Out - A proactive guide to Monitoring MongoDB
Webinar: Keep Calm and Scale Out - A proactive guide to Monitoring MongoDBWebinar: Keep Calm and Scale Out - A proactive guide to Monitoring MongoDB
Webinar: Keep Calm and Scale Out - A proactive guide to Monitoring MongoDB
MongoDB
 
10 signs your testing is not enough
10 signs your testing is not enough10 signs your testing is not enough
10 signs your testing is not enough
SQALab
 
Agile Metrics...That Matter
Agile Metrics...That MatterAgile Metrics...That Matter
Agile Metrics...That Matter
Erik Weber
 
PQF Overview
PQF OverviewPQF Overview
PQF Overview
Martin Hutchings
 
Metrics - You are what you measure (DevOps Perth)
Metrics - You are what you measure (DevOps Perth)Metrics - You are what you measure (DevOps Perth)
Metrics - You are what you measure (DevOps Perth)
Rob Crowley
 
Methods to Measure Marketing & The Metrics We Move
Methods to Measure Marketing & The Metrics We MoveMethods to Measure Marketing & The Metrics We Move
Methods to Measure Marketing & The Metrics We Move
Teacup Analytics
 
Brooks Bell Interactive Tama Presentation
Brooks Bell Interactive Tama PresentationBrooks Bell Interactive Tama Presentation
Brooks Bell Interactive Tama Presentation
Triangle American Marketing Association
 
Anton Muzhailo - Practical Test Process Improvement using ISTQB
Anton Muzhailo - Practical Test Process Improvement using ISTQBAnton Muzhailo - Practical Test Process Improvement using ISTQB
Anton Muzhailo - Practical Test Process Improvement using ISTQB
Ievgenii Katsan
 
SOFTWARE TESTING TRAFUNDAMENTALS OF SOFTWARE TESTING.pptx
SOFTWARE TESTING TRAFUNDAMENTALS OF SOFTWARE TESTING.pptxSOFTWARE TESTING TRAFUNDAMENTALS OF SOFTWARE TESTING.pptx
SOFTWARE TESTING TRAFUNDAMENTALS OF SOFTWARE TESTING.pptx
Financial Services Innovators
 
Amp Up Your Testing by Harnessing Test Data
Amp Up Your Testing by Harnessing Test DataAmp Up Your Testing by Harnessing Test Data
Amp Up Your Testing by Harnessing Test Data
TechWell
 
Coradiant
CoradiantCoradiant
Coradiant
gigamon
 
Introduction to test for non testers
Introduction to test for non testersIntroduction to test for non testers
Introduction to test for non testers
Mattias Lönnqvist
 
Software Testing
Software TestingSoftware Testing
Software Testing
MusTufa Nullwala
 
Why do my AB tests suck? measurecamp
Why do my AB tests suck?   measurecampWhy do my AB tests suck?   measurecamp
Why do my AB tests suck? measurecamp
Craig Sullivan
 
The agency's guide to effective user research
The agency's guide to effective user researchThe agency's guide to effective user research
The agency's guide to effective user research
UserTesting
 
Ericriesleanstartuppresentationforweb2
Ericriesleanstartuppresentationforweb2Ericriesleanstartuppresentationforweb2
Ericriesleanstartuppresentationforweb2
Edmund FOng
 
Mobile EHS and Quality Auditing - Lessons Learned
Mobile EHS and Quality Auditing - Lessons LearnedMobile EHS and Quality Auditing - Lessons Learned
Mobile EHS and Quality Auditing - Lessons Learned
Nimonik
 
10 Ways to Use ACT CRM as a CRM Product
10 Ways to Use ACT CRM as a CRM Product10 Ways to Use ACT CRM as a CRM Product
10 Ways to Use ACT CRM as a CRM Product
Tech Benders
 
DevOps By The Numbers
DevOps By The NumbersDevOps By The Numbers
DevOps By The Numbers
XebiaLabs
 
Stop refreshing vanity metrics & start focusing on the metrics that inform de...
Stop refreshing vanity metrics & start focusing on the metrics that inform de...Stop refreshing vanity metrics & start focusing on the metrics that inform de...
Stop refreshing vanity metrics & start focusing on the metrics that inform de...
Looker
 

Similar to Doing monitoring right (20)

Webinar: Keep Calm and Scale Out - A proactive guide to Monitoring MongoDB
Webinar: Keep Calm and Scale Out - A proactive guide to Monitoring MongoDBWebinar: Keep Calm and Scale Out - A proactive guide to Monitoring MongoDB
Webinar: Keep Calm and Scale Out - A proactive guide to Monitoring MongoDB
 
10 signs your testing is not enough
10 signs your testing is not enough10 signs your testing is not enough
10 signs your testing is not enough
 
Agile Metrics...That Matter
Agile Metrics...That MatterAgile Metrics...That Matter
Agile Metrics...That Matter
 
PQF Overview
PQF OverviewPQF Overview
PQF Overview
 
Metrics - You are what you measure (DevOps Perth)
Metrics - You are what you measure (DevOps Perth)Metrics - You are what you measure (DevOps Perth)
Metrics - You are what you measure (DevOps Perth)
 
Methods to Measure Marketing & The Metrics We Move
Methods to Measure Marketing & The Metrics We MoveMethods to Measure Marketing & The Metrics We Move
Methods to Measure Marketing & The Metrics We Move
 
Brooks Bell Interactive Tama Presentation
Brooks Bell Interactive Tama PresentationBrooks Bell Interactive Tama Presentation
Brooks Bell Interactive Tama Presentation
 
Anton Muzhailo - Practical Test Process Improvement using ISTQB
Anton Muzhailo - Practical Test Process Improvement using ISTQBAnton Muzhailo - Practical Test Process Improvement using ISTQB
Anton Muzhailo - Practical Test Process Improvement using ISTQB
 
SOFTWARE TESTING TRAFUNDAMENTALS OF SOFTWARE TESTING.pptx
SOFTWARE TESTING TRAFUNDAMENTALS OF SOFTWARE TESTING.pptxSOFTWARE TESTING TRAFUNDAMENTALS OF SOFTWARE TESTING.pptx
SOFTWARE TESTING TRAFUNDAMENTALS OF SOFTWARE TESTING.pptx
 
Amp Up Your Testing by Harnessing Test Data
Amp Up Your Testing by Harnessing Test DataAmp Up Your Testing by Harnessing Test Data
Amp Up Your Testing by Harnessing Test Data
 
Coradiant
CoradiantCoradiant
Coradiant
 
Introduction to test for non testers
Introduction to test for non testersIntroduction to test for non testers
Introduction to test for non testers
 
Software Testing
Software TestingSoftware Testing
Software Testing
 
Why do my AB tests suck? measurecamp
Why do my AB tests suck?   measurecampWhy do my AB tests suck?   measurecamp
Why do my AB tests suck? measurecamp
 
The agency's guide to effective user research
The agency's guide to effective user researchThe agency's guide to effective user research
The agency's guide to effective user research
 
Ericriesleanstartuppresentationforweb2
Ericriesleanstartuppresentationforweb2Ericriesleanstartuppresentationforweb2
Ericriesleanstartuppresentationforweb2
 
Mobile EHS and Quality Auditing - Lessons Learned
Mobile EHS and Quality Auditing - Lessons LearnedMobile EHS and Quality Auditing - Lessons Learned
Mobile EHS and Quality Auditing - Lessons Learned
 
10 Ways to Use ACT CRM as a CRM Product
10 Ways to Use ACT CRM as a CRM Product10 Ways to Use ACT CRM as a CRM Product
10 Ways to Use ACT CRM as a CRM Product
 
DevOps By The Numbers
DevOps By The NumbersDevOps By The Numbers
DevOps By The Numbers
 
Stop refreshing vanity metrics & start focusing on the metrics that inform de...
Stop refreshing vanity metrics & start focusing on the metrics that inform de...Stop refreshing vanity metrics & start focusing on the metrics that inform de...
Stop refreshing vanity metrics & start focusing on the metrics that inform de...
 

Recently uploaded

Vitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdfVitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke
 
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Globus
 
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdfDominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
AMB-Review
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
Quarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden ExtensionsQuarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden Extensions
Max Andersen
 
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdfAutomated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
timtebeek1
 
A Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of PassageA Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of Passage
Philip Schwarz
 
Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"
Donna Lenk
 
Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024
Paco van Beckhoven
 
Pro Unity Game Development with C-sharp Book
Pro Unity Game Development with C-sharp BookPro Unity Game Development with C-sharp Book
Pro Unity Game Development with C-sharp Book
abdulrafaychaudhry
 
Large Language Models and the End of Programming
Large Language Models and the End of ProgrammingLarge Language Models and the End of Programming
Large Language Models and the End of Programming
Matt Welsh
 
Enterprise Resource Planning System in Telangana
Enterprise Resource Planning System in TelanganaEnterprise Resource Planning System in Telangana
Enterprise Resource Planning System in Telangana
NYGGS Automation Suite
 
Lecture 1 Introduction to games development
Lecture 1 Introduction to games developmentLecture 1 Introduction to games development
Lecture 1 Introduction to games development
abdulrafaychaudhry
 
Understanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSageUnderstanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSage
Globus
 
Enhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdfEnhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdf
Globus
 
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Globus
 
Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume MontevideoVitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke
 
Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604
Fermin Galan
 
2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx
Georgi Kodinov
 
Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...
Globus
 

Recently uploaded (20)

Vitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdfVitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdf
 
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
 
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdfDominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
Quarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden ExtensionsQuarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden Extensions
 
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdfAutomated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
 
A Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of PassageA Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of Passage
 
Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"
 
Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024
 
Pro Unity Game Development with C-sharp Book
Pro Unity Game Development with C-sharp BookPro Unity Game Development with C-sharp Book
Pro Unity Game Development with C-sharp Book
 
Large Language Models and the End of Programming
Large Language Models and the End of ProgrammingLarge Language Models and the End of Programming
Large Language Models and the End of Programming
 
Enterprise Resource Planning System in Telangana
Enterprise Resource Planning System in TelanganaEnterprise Resource Planning System in Telangana
Enterprise Resource Planning System in Telangana
 
Lecture 1 Introduction to games development
Lecture 1 Introduction to games developmentLecture 1 Introduction to games development
Lecture 1 Introduction to games development
 
Understanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSageUnderstanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSage
 
Enhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdfEnhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdf
 
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
 
Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume MontevideoVitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume Montevideo
 
Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604
 
2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx
 
Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...
 

Doing monitoring right

Editor's Notes

  1. I’m John-Daniel Trask, or JD to everyone. First name is two names. I’ve loved code since the age of 9, more than 25 years of coding away any chance I got. I’m a 10 year Microsoft MVP, distinguished alumni and awarded the wellingtonian of the year in science and technology. I have VM snap shots of various machines, and thought it amusing that I was writing monitoring tools when I was in my early teens (“Console” which would track everything). I have been running businesses through high school and university. At high school I sold “browser privacy tools” to class mates… In 2013 we launched Raygun, a software crash reporting product. In 2015 a Real User Monitoring product. And in April announced our innovative approach to APM. We’re processing billions of data points while I’m standing here. A lot of my learnings are from our own experience in monitoring, but also from conversations with customers
  2. Reminder, in case you’re in the wrong room or can’t remember what this talk was going to be about. Target is more for folks getting started, but aim to provide value to even the folks focusing on monitoring in their org. The slides will be posted online. Easiest way to get them once posted: follow me on twitter: traskjd This is about monitoring your software, not everything else (e.g. osquery for monitoring your team machines etc)
  3. How should we be thinking about monitoring? Here’s how to get started, how to think about monitoring and even if you have monitoring in place, hopefully this challenges your thinking about what monitoring is really about.
  4. Coda Hale: You’re not employed to code, you’re employed to create business value.
  5. What is business value? - Adding anew feature that customers want - Improving an existing feature to please customers - Reducing bugs that annoy customers. - Making our software faster so not annoying our customers - Making our site look better (could be worse!) to please customers What is the common thread? Customers. I talk about ‘we write code for human beings’, yet most of us rarely think about the user, or worse – hold them in disdain.
  6. This is a basic getting started framework. Fact is, there’s so much stuff out there to help. Look at Raygun, we do 3 things now – CR, RUM, APM. Still get asked about Logs, custom metrics, uptime monitoring, security reporting, statsd endpoints, wire level monitoring,
  7. Big one for Raygun was StatsD.
  8. This was what got us excited – so easy to start instrumenting our code.
  9. Metrics are great for spotting trends, or issues, but they don’t tell you the why or how. The “what’s broken” indicates the symptom; the “why” indicates a (possibly intermediate) cause.“What” versus “why” is one of the most important distinctions in writing good monitoring with maximum signal and minimum noise.
  10. While here’s the full story, the data behind the metric. Helping me as a developer figure out the HOW and the WHY, so I can resolve the issue.
  11. Discussion going on about these two, whereby the basics seem to be that observability is a super-set of monitoring…. Twitter defined observability as: Monitoring - Alerting/visualization - Distributed systems tracing infrastructure - Log aggregation/analytics However I count all of that as monitoring. https://medium.com/@copyconstruct/monitoring-and-observability-8417d1952e1c
  12. Something at each level. Doesn’t need to be perfect, but shouldn’t lie to you (more on this later!) Why have I ordered it this way? The user is the most important. If they aren’t happy, we aren’t getting paid. Best to track that most The application helps understand things that are likely to impact the user. Server monitoring. But isn’t server monitoring super important? It is, but oftentimes it’s value is correlating to user monitoring. For example, measure user server load experience, if it’s slow, look at the server data being correlated with it. Maybe it’s a sign of maxed out
  13. Next slide
  14. Look at this, here’s just some stuff we could be doing…. So let’s get real. It’s why my framework is to only do some at the start and then build it up over time. Trying to handle everything will waste a lot of time, money and won’t help. You’ll still find issues (kind of like 100% code coverage in unit tests – you still have bugs)
  15. Bias, but errors are a very easy to add and high value thing to track. They are literally where you crap all over your customer. We see this “we don’t use this anymore”, but they have 68,000 users a month getting errors… I wonder what the CEO would think about the team not bothering with 68,000 customers being let down each month. It also gives you the ammunition you need to ask for time to pay down technical debt which is common but engineers typically get asked to keep doing feature development.
  16. While the items that I listed impact users, we also want to be creative and think about the non-obvious.
  17. Forget about the “well technically”, which is common for us engineers. Think about the business value, the end user. That changes what we measure!
  18. There’s lots of things that aren’t immediately obvious. However, they can create enormous business value. Cost to serve is a huge one for many earlier stage organizations. If you’re spending more to provide the service than the customer pays, you won’t be around very long. This is a number typically managed by VP’s or higher, but helping them is never a bad idea. It also leads to helping understand the cost to scale. I’m sure there’s some examples in the audience? What’s a thing you monitored and were surprised by?
  19. Getting the most out of your investment
  20. Connect your data together Key is often being able to easily correlate data across different monitors. For example, seeing a response time start exploading and rapidly identifying if there’s an activity issue on your web server, the underlying database, one of the caches, etc. Connect your teams One of the biggest wins we see is making monitoring more than just an engineering or SRE concern. Being able to lift error reports into Jira is one example – it connects product and project managers and helps them work how they like to, but in collaboration with engineering.
  21. TV’s Just like I believe whiteboards are better than almost any digital equivalent, getting dashboards of live data on the wall is amazing. Suddenly key metrics become part of the water cooler chat. Jump to next slide.
  22. Averages are lies. Why do so many tools in this area use them? Because it’s super cheap. But a cheap lie doesn’t make it a good lie.
  23. Quantiles help us understand distribution
  24. Bell Curve - How we’re taught distributions look like. - This shows the median and the 25% and 75% - This is kind of bullshit. Think back to the Gates example, it ain’t a bell curve distribution. It’s almost always the same in software.
  25. Actual distribution - This is more common - Sometimes you may even see a lump near the end - Understanding outliers is key to better monitoring
  26. Why does more tooling not support this? You need to store A LOT of data, and you need to then look at the % points after sorting it. This gets very slow. Example: 100m events, which is not actually a lot. 8 bits in a byte, 64 bit numbers, you’re loading 762MB of data into memory, sorting it and taking single values at positions. Event if 32bit it’s a lot of data, but remember – 100m events is not that much when it comes to machine data!
  27. Getting the most out of your investment
  28. What happens on your server is not what happens to the user. Ensure you track the customer experience. Note about RUM and what we see with todays very heavy JS frameworks
  29. Noticing a trend here? I’m big on making sure we always focus on the user.
  30. Not uncommon to see tech teams try and avoid the costs associated with monitoring. They might only monitor some things, or only a few servers. This causes problems. Also, asking for money is easy if you are connecting it to the business value. Noticing a pattern here? 
  31. Sampling has a place, but be wary around your tools. Example: ecommerce provider with 1 server, costing 10% of all sales. Another CR tool was sampling but buried that note in their docs, so customer couldn’t see the issue
  32. Always, ALWAYS takes longer than you expect. Not a sales pitch, but if I’ve spent $10m building a product, tell me how you’re going to do it yourself in six months? I want to hire you. Also, statistics can be very hard. Also, introduces concern that maybe the bug is in the monitoring tools. There are great open source projects also, but consider the TCO of now managing that internally DOES BUILDING IT YOURSELF CREATE BUSINESS VALUE? No. Unless you are Netflix etc.
  33. Make it easy to surface statistics, monitor data, etc. If it’s difficult, it likely won’t be added when the time pressure is on. Similar impact as with Unit Tests, oftentimes it won’t be done unless somebody else has already laid all the groundwork with mocks, fakes etc. Make it so easy that it’s not considered a real cost to add (see: impact of StatsD)
  34. Raygun story of CTO’s pet project: error tracking, that almost nobody in the business can use. Did some magical things, shame only one person in this company of thousands actually could use the thing… Other story: one customer had to employ a full time person to teach the team how to use dashboards! wtfbqq
  35. Raygun story of CTO’s pet project: error tracking, that almost nobody in the business can use. Did some magical things, shame only one person in this company of thousands actually could use the thing… Other story: one customer had to employ a full time person to teach the team how to use dashboards! wtfbqq
  36. We see this all the time, and it’s frustrating. Raygun story: The highest value thing we can do, is hold training sessions with the team. Story of Board Meetings (rare, but should be common). Just installing it is kind of like buying your pain killers but never actually using them when in pain.
  37. Remember how almost everything goes back to fellow humans? Look, I know it’s awesome coding away. Raygun Story: Events, taking engineers rather than sales people. 180 degree change. See the impact, feel the pain. Next-level engineer.
  38. Welcome to GDPR. Where all your ‘I will build this or cobble it together myself’ could cost your company 4% of revenue when you’re audited. Youch! Yet, I keep seeing this, and I think it’s the biggest threat to businesses in relation to compliance.
  39. SUM UP WHAT WE COVERED