9. solution: alert only things that meet the following
criteria:
1) actionable - can I do something about this?
2) does this currently or immediately break the business?
3) this cannot wait till morning
11. 2nd Deadly sin of monitoring
Single team does monitoring, everyone
else is second tier
12. Solution: direct alerts to relevant parties
1) only person who can fix the problem gets alerted, others get emails
2) system needs to be smart enough to make the choice, and fixed when it
makes a mistake in waking up the wrong person
16. Solution: Monitoring needs to be a part of the design
the empty error - classic example - null pointer exceptions in java
make your developers accountable for empty errors
17.
18.
19. solutions:
self correcting metrics. if an alert goes off for a metric, and we decide it wasn’t a
real error - a dialog for changing the threshold should pop up.