Often what you monitor and get alerted on is defined by your tools, rather than what makes the most sense to you and your organisation. Alerts on metrics such as CPU usage which are noisy and rarely spot real problems, while outages go undetected. Monitoring systems can also be challenging to maintain, and overall provide a poor return on investment.
In the past few years several new monitoring systems have appeared with more powerful semantics and which are easier to run, which offer a way to vastly improve how your organisation operates Prometheus is one such system. This talk will look at the monitoring ideal and how whitebox monitoring with a time series database, multi-dimensional labels and a powerful querying/alerting language can free you from midnight pages.