Be the first to like this
Breaking stuff is part of being a developer, but that never makes it any easier when it happens to you. The Elasticsearch outage of 2017 was the biggest outage our company has ever experienced. We drifted between full-blown downtime and degraded service for almost a week. However, it taught us a lot about how we can better prepare and handle upgrades in the future. It also bonded our team together and highlighted the important role teamwork and leadership plays in high-stress situations. The lessons learned are ones that we will not soon forget. In this talk, I will share those lessons and our story in hopes that others can learn from our experiences and be better prepared when they execute their next big upgrade.