Working in Web Operations means dealing with production systems that in most cases needs to be operational 24×7x365.
To reach 99.99999% uptime, you must fail as little as possible.
This talk will go through a few real-world incidents and failures experienced by our small WebOps team, and outline what we are learning (the hard way), and how we’re trying to improve.
What could possibly go wrong? :-)