SlideShare a Scribd company logo
Beyond Nagios


      NYC DevOps 2011/07/21
Alexis Lê-Quôc - alq@datadoghq.com
Beyond Nagios


      NYC DevOps 2011/07/21
Alexis Lê-Quôc - alq@datadoghq.com
What I’m Going To Talk About

    • Super-quick   Nagios summary

    • Monitoring/Alerting   Pathologies

    • How   to fix it
What Is

• “Industry   Standard in IT Infrastructure Monitoring”

  • For   once it’s true...

• Scheduler    & Notification server
(+) Robust, Mature code-base

(-) Configuration can be daunting

(-) Not human-friendly
“OVERWHELMING”
A “NORMAL” HOUR
THE “OTHER” NAGIOS UI
Process alerts
                  & Fix things




Receive alerts                    Add more checks




     THE HAPPY START
Missed alerts




Ignore Alerts                   Add more checks




 THE SPIRAL OF DEATH
Quality
      of life


Few checks
Few alerts




                 More checks
                 Too many alerts

                                   # of alerts
             FIGHT OR FLIGHT
Effective                                    Checks n^2
 Coverage                                     Fault-tolerant
                                              Less urgency

Few checks
Few alerts
Every host counts




                    More checks
                    Too many alerts
                    Every host still counts             Scale
                                                    Complexity

    THE TROUGH OF DESPAIR
Effective
Coverage




                           Scale
    IF ONLY I ADDED MORE
           CHECKS...
Reset!
Way Out
‣Breathe!
‣Measure
‣Look for Patterns
‣Put Alerts in Context
‣Focus on the Business
Turn Nagios logs into structured data




                            Analyze


              day     | success_pct | warning_pct | error_pct | events
---------------------+-------------+-------------+-----------+--------
           2011-07-12 00:00:00 |       89 |       0|       2 | 9628
           2011-07-13 00:00:00 |       90 |       0|       2 | 9210
           2011-07-14 00:00:00 |       90 |       0|       2 | 9735
           2011-07-15 00:00:00 |       89 |       0|       2 | 9531




                    MEASURE
day     | success_pct | warning_pct | error_pct | events
---------------------+-------------+-------------+-----------+--------
           2011-07-12 00:00:00 |       89 |       0|       2 | 9628
           2011-07-13 00:00:00 |       90 |       0|       2 | 9210
           2011-07-14 00:00:00 |       90 |       0|       2 | 9735
           2011-07-15 00:00:00 |       89 |       0|       2 | 9531




VISUALIZATION MATTERS
In Time




      Flapping




LOOK FOR PATTERNS
PUT ALERTS IN CONTEXT
    https://app.datad0g.com/dash/dash/1000#/date_range/1310682467000.0-1310684267000.0
Ultimate (hard) question
‣Does this alert impact the business?
 ‣If so by how much?
 ‣Assumes that you track business metrics...
 ‣And they can be accessed programatically



FOCUS ON THE BUSINESS
What applies to Nagios...
Applies to other sources too




                       etc...
Thanks


http://datadoghq.com

More Related Content

Similar to Beyond Nagios

Securing Systems - Still Crazy After All These Years
Securing Systems - Still Crazy After All These YearsSecuring Systems - Still Crazy After All These Years
Securing Systems - Still Crazy After All These Years
Adrian Sanabria
 
ISACA Ireland Keynote 2015
ISACA Ireland Keynote 2015ISACA Ireland Keynote 2015
ISACA Ireland Keynote 2015
Shannon Lietz
 
Rundeck Overview
Rundeck OverviewRundeck Overview
Rundeck Overview
Rundeck
 
How to use Istio/Anthos to build Enterprise SRE
How to use Istio/Anthos to build Enterprise SREHow to use Istio/Anthos to build Enterprise SRE
How to use Istio/Anthos to build Enterprise SRE
Tzung-Hsien (Shawn) Ho
 
Business Case Calculator for DevOps Initiatives - Leading credit card service...
Business Case Calculator for DevOps Initiatives - Leading credit card service...Business Case Calculator for DevOps Initiatives - Leading credit card service...
Business Case Calculator for DevOps Initiatives - Leading credit card service...
Capgemini
 
Modern Monitoring [ with Prometheus ]
Modern Monitoring [ with Prometheus ]Modern Monitoring [ with Prometheus ]
Modern Monitoring [ with Prometheus ]
Haggai Philip Zagury
 
An Introduction to ORYX Software
An Introduction to ORYX SoftwareAn Introduction to ORYX Software
An Introduction to ORYX Software
Accountagility
 
DevSecCon Keynote
DevSecCon KeynoteDevSecCon Keynote
DevSecCon Keynote
Shannon Lietz
 
DevSecCon KeyNote London 2015
DevSecCon KeyNote London 2015DevSecCon KeyNote London 2015
DevSecCon KeyNote London 2015
Shannon Lietz
 
Information Security in the Gaming World
Information Security in the Gaming WorldInformation Security in the Gaming World
Information Security in the Gaming WorldDimitrios Stergiou
 
Q insure
Q insure Q insure
Quick wins in the NetOps Journey by Vincent Boon, Opengear
Quick wins in the NetOps Journey by Vincent Boon, OpengearQuick wins in the NetOps Journey by Vincent Boon, Opengear
Quick wins in the NetOps Journey by Vincent Boon, Opengear
MyNOG
 
Ploigos - How It Works, and Why.pdf
Ploigos - How It Works, and Why.pdfPloigos - How It Works, and Why.pdf
Ploigos - How It Works, and Why.pdf
Bill Bensing
 
EN - Workload Module
EN - 	Workload ModuleEN - 	Workload Module
EN - Workload Module
Visual Planning
 
Achieving Compliance Through Security
Achieving Compliance Through SecurityAchieving Compliance Through Security
Achieving Compliance Through Security
EnergySec
 
What does performance mean in the cloud
What does performance mean in the cloudWhat does performance mean in the cloud
What does performance mean in the cloud
Michael Kopp
 
OSDC 2014: Fernando Hönig - New Data Center Service Model: Cloud + DevOps
OSDC 2014:  Fernando Hönig - New Data Center Service Model: Cloud + DevOpsOSDC 2014:  Fernando Hönig - New Data Center Service Model: Cloud + DevOps
OSDC 2014: Fernando Hönig - New Data Center Service Model: Cloud + DevOps
NETWAYS
 
45 Minutes to PCI Compliance in the Cloud
45 Minutes to PCI Compliance in the Cloud45 Minutes to PCI Compliance in the Cloud
45 Minutes to PCI Compliance in the Cloud
CloudPassage
 
Do You Really Need to Evolve From Monitoring to Observability?
Do You Really Need to Evolve From Monitoring to Observability?Do You Really Need to Evolve From Monitoring to Observability?
Do You Really Need to Evolve From Monitoring to Observability?
Splunk
 
Nagios Conference 2012 - Kishore Jalleda - Nagios in the Agile DevOps Continu...
Nagios Conference 2012 - Kishore Jalleda - Nagios in the Agile DevOps Continu...Nagios Conference 2012 - Kishore Jalleda - Nagios in the Agile DevOps Continu...
Nagios Conference 2012 - Kishore Jalleda - Nagios in the Agile DevOps Continu...
Nagios
 

Similar to Beyond Nagios (20)

Securing Systems - Still Crazy After All These Years
Securing Systems - Still Crazy After All These YearsSecuring Systems - Still Crazy After All These Years
Securing Systems - Still Crazy After All These Years
 
ISACA Ireland Keynote 2015
ISACA Ireland Keynote 2015ISACA Ireland Keynote 2015
ISACA Ireland Keynote 2015
 
Rundeck Overview
Rundeck OverviewRundeck Overview
Rundeck Overview
 
How to use Istio/Anthos to build Enterprise SRE
How to use Istio/Anthos to build Enterprise SREHow to use Istio/Anthos to build Enterprise SRE
How to use Istio/Anthos to build Enterprise SRE
 
Business Case Calculator for DevOps Initiatives - Leading credit card service...
Business Case Calculator for DevOps Initiatives - Leading credit card service...Business Case Calculator for DevOps Initiatives - Leading credit card service...
Business Case Calculator for DevOps Initiatives - Leading credit card service...
 
Modern Monitoring [ with Prometheus ]
Modern Monitoring [ with Prometheus ]Modern Monitoring [ with Prometheus ]
Modern Monitoring [ with Prometheus ]
 
An Introduction to ORYX Software
An Introduction to ORYX SoftwareAn Introduction to ORYX Software
An Introduction to ORYX Software
 
DevSecCon Keynote
DevSecCon KeynoteDevSecCon Keynote
DevSecCon Keynote
 
DevSecCon KeyNote London 2015
DevSecCon KeyNote London 2015DevSecCon KeyNote London 2015
DevSecCon KeyNote London 2015
 
Information Security in the Gaming World
Information Security in the Gaming WorldInformation Security in the Gaming World
Information Security in the Gaming World
 
Q insure
Q insure Q insure
Q insure
 
Quick wins in the NetOps Journey by Vincent Boon, Opengear
Quick wins in the NetOps Journey by Vincent Boon, OpengearQuick wins in the NetOps Journey by Vincent Boon, Opengear
Quick wins in the NetOps Journey by Vincent Boon, Opengear
 
Ploigos - How It Works, and Why.pdf
Ploigos - How It Works, and Why.pdfPloigos - How It Works, and Why.pdf
Ploigos - How It Works, and Why.pdf
 
EN - Workload Module
EN - 	Workload ModuleEN - 	Workload Module
EN - Workload Module
 
Achieving Compliance Through Security
Achieving Compliance Through SecurityAchieving Compliance Through Security
Achieving Compliance Through Security
 
What does performance mean in the cloud
What does performance mean in the cloudWhat does performance mean in the cloud
What does performance mean in the cloud
 
OSDC 2014: Fernando Hönig - New Data Center Service Model: Cloud + DevOps
OSDC 2014:  Fernando Hönig - New Data Center Service Model: Cloud + DevOpsOSDC 2014:  Fernando Hönig - New Data Center Service Model: Cloud + DevOps
OSDC 2014: Fernando Hönig - New Data Center Service Model: Cloud + DevOps
 
45 Minutes to PCI Compliance in the Cloud
45 Minutes to PCI Compliance in the Cloud45 Minutes to PCI Compliance in the Cloud
45 Minutes to PCI Compliance in the Cloud
 
Do You Really Need to Evolve From Monitoring to Observability?
Do You Really Need to Evolve From Monitoring to Observability?Do You Really Need to Evolve From Monitoring to Observability?
Do You Really Need to Evolve From Monitoring to Observability?
 
Nagios Conference 2012 - Kishore Jalleda - Nagios in the Agile DevOps Continu...
Nagios Conference 2012 - Kishore Jalleda - Nagios in the Agile DevOps Continu...Nagios Conference 2012 - Kishore Jalleda - Nagios in the Agile DevOps Continu...
Nagios Conference 2012 - Kishore Jalleda - Nagios in the Agile DevOps Continu...
 

Recently uploaded

zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofszkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
Alex Pruden
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
Assure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyesAssure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
Peter Spielvogel
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
Enhancing Performance with Globus and the Science DMZ
Enhancing Performance with Globus and the Science DMZEnhancing Performance with Globus and the Science DMZ
Enhancing Performance with Globus and the Science DMZ
Globus
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
James Anderson
 
UiPath Community Day Dubai: AI at Work..
UiPath Community Day Dubai: AI at Work..UiPath Community Day Dubai: AI at Work..
UiPath Community Day Dubai: AI at Work..
UiPathCommunity
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
nkrafacyberclub
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 

Recently uploaded (20)

zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofszkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
Assure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyesAssure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyes
 
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
Enhancing Performance with Globus and the Science DMZ
Enhancing Performance with Globus and the Science DMZEnhancing Performance with Globus and the Science DMZ
Enhancing Performance with Globus and the Science DMZ
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
 
UiPath Community Day Dubai: AI at Work..
UiPath Community Day Dubai: AI at Work..UiPath Community Day Dubai: AI at Work..
UiPath Community Day Dubai: AI at Work..
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 

Beyond Nagios

  • 1. Beyond Nagios NYC DevOps 2011/07/21 Alexis Lê-Quôc - alq@datadoghq.com
  • 2. Beyond Nagios NYC DevOps 2011/07/21 Alexis Lê-Quôc - alq@datadoghq.com
  • 3. What I’m Going To Talk About • Super-quick Nagios summary • Monitoring/Alerting Pathologies • How to fix it
  • 4. What Is • “Industry Standard in IT Infrastructure Monitoring” • For once it’s true... • Scheduler & Notification server
  • 5. (+) Robust, Mature code-base (-) Configuration can be daunting (-) Not human-friendly
  • 9. Process alerts & Fix things Receive alerts Add more checks THE HAPPY START
  • 10. Missed alerts Ignore Alerts Add more checks THE SPIRAL OF DEATH
  • 11. Quality of life Few checks Few alerts More checks Too many alerts # of alerts FIGHT OR FLIGHT
  • 12. Effective Checks n^2 Coverage Fault-tolerant Less urgency Few checks Few alerts Every host counts More checks Too many alerts Every host still counts Scale Complexity THE TROUGH OF DESPAIR
  • 13. Effective Coverage Scale IF ONLY I ADDED MORE CHECKS...
  • 15. Way Out ‣Breathe! ‣Measure ‣Look for Patterns ‣Put Alerts in Context ‣Focus on the Business
  • 16. Turn Nagios logs into structured data Analyze day | success_pct | warning_pct | error_pct | events ---------------------+-------------+-------------+-----------+-------- 2011-07-12 00:00:00 | 89 | 0| 2 | 9628 2011-07-13 00:00:00 | 90 | 0| 2 | 9210 2011-07-14 00:00:00 | 90 | 0| 2 | 9735 2011-07-15 00:00:00 | 89 | 0| 2 | 9531 MEASURE
  • 17. day | success_pct | warning_pct | error_pct | events ---------------------+-------------+-------------+-----------+-------- 2011-07-12 00:00:00 | 89 | 0| 2 | 9628 2011-07-13 00:00:00 | 90 | 0| 2 | 9210 2011-07-14 00:00:00 | 90 | 0| 2 | 9735 2011-07-15 00:00:00 | 89 | 0| 2 | 9531 VISUALIZATION MATTERS
  • 18. In Time Flapping LOOK FOR PATTERNS
  • 19. PUT ALERTS IN CONTEXT https://app.datad0g.com/dash/dash/1000#/date_range/1310682467000.0-1310684267000.0
  • 20. Ultimate (hard) question ‣Does this alert impact the business? ‣If so by how much? ‣Assumes that you track business metrics... ‣And they can be accessed programatically FOCUS ON THE BUSINESS
  • 21. What applies to Nagios... Applies to other sources too etc...

Editor's Notes

  1. \n
  2. \n
  3. \n
  4. \n
  5. \n
  6. \n
  7. \n
  8. \n
  9. \n
  10. \n
  11. \n
  12. \n
  13. \n
  14. \n
  15. \n
  16. \n
  17. \n
  18. \n
  19. \n
  20. \n
  21. \n
  22. \n