Ever since the #monitoringsucks trend kicked off a conversation about the state of monitoring tools in 2011, there has been a flurry of activity resulting in new solutions, improved tools, and applications generating tons of data. However, we are still faced with the same issues almost 4 years later. Alerts still generate far too much noise to be useful. Dashboards aren’t actionable and require human interpretation. The volume of log, time series, and other data makes it difficult to collate, visualize, and interpret in the mythical single pane of glass. How do we definitively solve these problems?
Data science. Using advances in data science and machine learning that are already being applied to “sexy” problems at companies around the globe, we can finally reach a tipping point when it comes to #monitoringsucks issues. New data science tools can pinpoint problems before they hit a static threshold, group alerts from a variety of sources into a single logical error, and prevent eye strain from studying hundreds of graphs. In this talk, I will be discussing the virtues – and pitfalls – of new monitoring entrants like Kale from Etsy, Bosun from StackExchange, and Twitter’s open source R package AnomalyDetection.