There’s a reason “i test in prod” isn’t a cheeky take but a lived reality. And that reason is there is no place like production. Not local dev or staging or other environment. This became clear when I deployed a tiny config change that passed all checks, reviews and pre-production environments that triggered a SEV-1. Examining each step of the journey from PR to production I uncovered the snafu that had occurred (unsurprisingly it relates to overwriting key blocks on nested YAML files). I’ll share how difficult it was to reconstruct the chain of events in the system compared to the ideal case of a highly observable system and how to share your own incident learnings since we all test in prod!