Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
A Deep Dive intoNagios Analytics   Alexis Lê-Quôc (@alq)    http://datadoghq.com
@alqDev & OpsNagios user since2008Datadog co-founder
A little survey
Top 3 failed checks
That woke me up                       That I responded to   That I responded to    5 weeks ago        last week     Top 3 ...
That woke me upThat I responded to     last week                                           That I responded to            ...
At best, finding local optimumsUsing memory to    prioritize remediation...  At worst, brownian motion
Analytics
Performance Metrics Nagios Traffic Other Sources                In the “Cloud”
Nagios a “chatty” source   out of 40+ Datadog supports
One example
Almost 13000 Nagios “events”        over past week
Constant stream
86 notifications!
Pattern
Pattern
More data?More questions.
A dialog with data    Not a scientific study
Population25%   50% 75% 100%20    93 322 904
Does size matter?
Weekly Count per host split by quartile
Weekly count per host split by quartile                           Outliers                          Sick hosts,           ...
Notifications
Notifications   1-3% of alerts notifyLittle difference per quartile
Does time of day   matter?
Mean about the same  across quartilesTime-based deviation?
Does the day of week      matter?
Not really
Squeaky wheels?   (checks)
Outlier
Outlier in more detail
Long Tail
Squeaky wheel?    (hosts)
Same outlier
Similar pattern as checks
Long Tail
Recurring alerts
Happen   s OftenSeldomhappens          Young   Old
Occur often, for a long time               Tolerated                           Happen once in a while
More data?More questions.
HOWTO?
Awk                 RFind out tomorrow!                     d3      Postgres
Presentation matters
Take-away?
Take-aways• Don’t rely on your memory• Your Nagios logs are a treasure trove• Have a dialog with your data• Presentation m...
Nagios Conference 2012 - Alexis Le Quoc - Deep Dive into Nagios Analytics
Nagios Conference 2012 - Alexis Le Quoc - Deep Dive into Nagios Analytics
Nagios Conference 2012 - Alexis Le Quoc - Deep Dive into Nagios Analytics
Nagios Conference 2012 - Alexis Le Quoc - Deep Dive into Nagios Analytics
Upcoming SlideShare
Loading in …5
×

Nagios Conference 2012 - Alexis Le Quoc - Deep Dive into Nagios Analytics

787 views

Published on

Alexis Le Quoc's presentation on Diving into Nagios Analytics
The presentation was given during the Nagios World Conference North America held Sept 25-28th, 2012 in Saint Paul, MN. For more information on the conference (including photos and videos), visit: http://go.nagios.com/nwcna

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Nagios Conference 2012 - Alexis Le Quoc - Deep Dive into Nagios Analytics

  1. 1. A Deep Dive intoNagios Analytics Alexis Lê-Quôc (@alq) http://datadoghq.com
  2. 2. @alqDev & OpsNagios user since2008Datadog co-founder
  3. 3. A little survey
  4. 4. Top 3 failed checks
  5. 5. That woke me up That I responded to That I responded to 5 weeks ago last week Top 3 failed checks That impacts our business That most of my team the most?responded to at least once
  6. 6. That woke me upThat I responded to last week That I responded to 5 weeks ago Top 3 failed checks That most of my team That impacts our business responded to at least once the most?
  7. 7. At best, finding local optimumsUsing memory to prioritize remediation... At worst, brownian motion
  8. 8. Analytics
  9. 9. Performance Metrics Nagios Traffic Other Sources In the “Cloud”
  10. 10. Nagios a “chatty” source out of 40+ Datadog supports
  11. 11. One example
  12. 12. Almost 13000 Nagios “events” over past week
  13. 13. Constant stream
  14. 14. 86 notifications!
  15. 15. Pattern
  16. 16. Pattern
  17. 17. More data?More questions.
  18. 18. A dialog with data Not a scientific study
  19. 19. Population25% 50% 75% 100%20 93 322 904
  20. 20. Does size matter?
  21. 21. Weekly Count per host split by quartile
  22. 22. Weekly count per host split by quartile Outliers Sick hosts, silenced checks
  23. 23. Notifications
  24. 24. Notifications 1-3% of alerts notifyLittle difference per quartile
  25. 25. Does time of day matter?
  26. 26. Mean about the same across quartilesTime-based deviation?
  27. 27. Does the day of week matter?
  28. 28. Not really
  29. 29. Squeaky wheels? (checks)
  30. 30. Outlier
  31. 31. Outlier in more detail
  32. 32. Long Tail
  33. 33. Squeaky wheel? (hosts)
  34. 34. Same outlier
  35. 35. Similar pattern as checks
  36. 36. Long Tail
  37. 37. Recurring alerts
  38. 38. Happen s OftenSeldomhappens Young Old
  39. 39. Occur often, for a long time Tolerated Happen once in a while
  40. 40. More data?More questions.
  41. 41. HOWTO?
  42. 42. Awk RFind out tomorrow! d3 Postgres
  43. 43. Presentation matters
  44. 44. Take-away?
  45. 45. Take-aways• Don’t rely on your memory• Your Nagios logs are a treasure trove• Have a dialog with your data• Presentation matters

×