Nobody likes false negatives. When your Nagios probes fail to detect a problem, it can hurt your sales, your reputation, and even your ego (especially your ego). The solution: tune the thresholds. Right? You can handle a couple spurious late-night pages if it means you’ll reliably detect real failures.
I will argue that – while easy – exchanging false negatives for false positives does more harm than good. Borrowing the medical concepts of specificity and sensitivity, I’ll show how deceptive this tradeoff can be. I’ll also make the case that putting in the extra effort to minimize both types of falsehoods is necessary and healthy. When the alarm goes off, you shouldn’t have to spend precious minutes sniffing for smoke.