This document discusses various topics related to cloud computing including:
- The challenges of maintaining virtual, multi-tenant systems running on massive infrastructure and the need for compulsory maintenance.
- How cloud systems are designed to be resilient by running services atop unreliable systems and allowing them to be scaled and changed dynamically.
- The importance of monitoring metrics from within services to understand latency, queues, workers and ensure any engineer can access performance data and create new metrics.
- Examples of open source tools like Heka and Riemann that can be used to collect and analyze metrics.