PagerDuty's very own Owen Kim had the misfortune of watching its abused, under-provisioned Cassandra cluster collapse. This presentation covers the lessons learned from that experience like: • Which of the many, many metrics did we learn to watch for • What mistakes we made that lead to this catastrophe • How we have changed our use to make our Cassandra cluster more stable