Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

StatsCraft 2015: The problem (Keynote) - Nir Cohen


Published on

Slides of Nir Cohen's talk at StatsCraft 2015

Published in: Technology
  • Be the first to comment

  • Be the first to like this

StatsCraft 2015: The problem (Keynote) - Nir Cohen

  1. 1. StatsCraftStatsCraft Monitoring ConferenceMonitoring Conference website and agenda: twitter: (#statscraft) facebook: email: @statscraft
  2. 2. AgendaAgenda 1. Understand the problem. 2. Understand what monitoring is. 3. Example use-case(s) 4. A different approach 5. Learn methodologies and tools
  3. 3. The ProblemThe Problem Nir Cohen @ Gigaspaces @thinkops
  4. 4. WeWe monitor because...monitor because... We want to satify theWe want to satify the customer.customer. (make money?)
  5. 5. Automated Resource Provisioning Configuration Management Automated Code Deployment Continuous Whatever Monitoring Still underrated...Still underrated... Automated Resource Provisioning Configuration Management Automated Code Deployment Continuous Whatever Monitoring PROBLEM!PROBLEM!
  6. 6. Blame the tools?Blame the tools?
  7. 7. Problem originProblem origin DISCLAIMERDISCLAIMER
  8. 8. We're monitoringWe're monitoring the wrong things.the wrong things. _rootCauseAnalysis: the alternative is harder.
  9. 9. We're consideringWe're considering logs a second classlogs a second class citizen.citizen. _rootCauseAnalysis: the alternative is harder.
  10. 10. Our data is lacking.Our data is lacking. _rootCauseAnalysis: inertia. that's how it was, that's ho w it is.
  11. 11. We separateWe separate monitoring frommonitoring from applicationapplication _rootCauseAnalysis: we're not used to this. (Ops problem)
  12. 12. We monitorWe monitor reactively, notreactively, not proactivelyproactively _rootCauseAnalysis: reaction requires less initial energy than anticipation.
  13. 13. We put uptimeWe put uptime above system andabove system and product qualityproduct quality _rootCauseAnalysis: it's much easier.
  14. 14. We deal with hardWe deal with hard limits.limits. _rootCauseAnalysis: arbitrary numbers are easier to set.
  15. 15. Monitoring is non-Monitoring is non- functional butfunctional but resource hungryresource hungry _rootCauseAnalysis: we just don't accept it.
  16. 16. Good monitoringGood monitoring requires the rightrequires the right people, not just Ops!people, not just Ops! _rootCauseAnalysis: delegation is natural. other have mor e important things to do.
  17. 17. Alert fatigue isAlert fatigue is common.common. _rootCauseAnalysis: solving issues is much easier than so lving problems, and apparently, we ar e additted to non-actionable alerts.
  18. 18. We're auto-scalingWe're auto-scaling prematurelyprematurely _rootCauseAnalysis: brute force is natural
  19. 19. We're choosing theWe're choosing the wrong tools.wrong tools. _rootCauseAnalysis: it's easier to choose the tool than to choos e what to monitor.
  20. 20. Good monitoringGood monitoring is hardis hard _rootCauseAnalysis: systems become complex, so they're ha rder to monitor.
  21. 21. So, after all, why do weSo, after all, why do we not monitor properly?not monitor properly? 1. SimplificationSimplification 2. DelegationDelegation 3. RationalizationRationalization _rootCauseAnalysis:
  22. 22. No fear,No fear, ​​Let's see how we can makeLet's see how we can make this all betterthis all better is here!is here!
  23. 23. “ If a service crashes and no one is around to monitor it, does it raise an alert?