More Related Content


Monitoring Is Never Done

  1. Monitoring is Never “Done” @melaniemj
  2. Responsibilities @ Yardi Implementation and administration of monitoring, alerting, and log aggregation/analysis tools. o 15,000+ Devices o 9 Datacenters o 5000+ Customer Installations o We monitor windows envs with linux envs
  3. This was me in 2008 @ Point2
  4. How code is delivered
  5. How code operates in production
  6. A good problem to have Everyone wants “the monitoring” so they can say “it’s monitored”
  7. Communicating Work o Classify o Quantify o Qualify
  8. Words.... o Logging o Alerting o Dashboards o Reports o 4-9s o 24x7x365 this shit can’t go down
  9. Can it be this simple? Let’s talk about “the monitoring” for X Be awesome X is monitored
  10. DCVA (OODA)
  11. 1. Definition I can hit this one page so it’s up right? No thanks, let’s redefine status
  12. 1. Definition o What questions are you trying to answer? o What information do you need when a failure occurs? o What are the most common failures? o Who is the audience for the information?
  13. 2. Checks & Collections o Environment & Code o Data points o Detailed logs o Current state
  14. 3. Visualization o Analysis o Dashboards o Correlations
  15. 4. Action o Fault detection o Alerting o RCA
  16. Cycle (What to collect) (Inform on failure) (How to collect) (Make collections pretty)
  17. Team Time Distribution
  18. Time Distribution (Desired)
  19. Is “X” monitored? When “X” goes into some degraded state o The right people know. o They have enough information to find the problem, recover, and later to do RCA. o If they don’t they will revisit definition.
  20. How does your team o Classify o Quantify o Qualify
  21. Monitoring is Never “Done” Melanie Cey @melaniemj Senior Systems Analyst Systems Reliability Engineering @ Yardi

Editor's Notes

  1. What to collect
  2. How to collect
  3. Make collections pretty
  4. Inform on failure