Surge 2014: From Clouds to Roots: root cause performance analysis at Netflix. Brendan Gregg.
At Netflix, high scale and fast deployment rule. The possibilities for failure are endless, and the environment excels at handling this, regularly tested and exercised by the simian army. But, when this environment automatically works around systemic issues that aren’t root-caused, they can grow over time. This talk describes the challenge of not just handling failures of scale on the Netflix cloud, but also new approaches and tools for quickly diagnosing their root cause in an ever changing environment.