Slides originally written in April 2013 for a private conference and internal use at Netflix. Publishing now since Heartbleed is another example of an epidemic failure mode.
11. What to do?
Automated diversity management
Diversified automation
Efficient vs. Antifragile
12. Specific Ideas
• Automate running a mixture
– Diversity as default for any service stack
– No developer overhead, stay agile, low cost
• Support oldest and newest versions together
– Automate running 50/50 mix CentOS/Ubuntu
– Mix versions of JDK, Tomcat, etc.
• Vendor diversity
– Multiple DNS vendors, cloud regions, costs more
– Multiple cloud vendors? Much higher cost.
14. Deployment
• Builds
– Manual to test, automate if it works
– Modify build to generate permutation AMIs
– Modify Asgard to auto-deploy permutations
• Data collection
– Tag each instance with its permutation
– Gather metrics by permutation per instance
– Do R-based Design of Experiments analysis
15. Analysis
• As a function of permutations
– Error rate
– Response time
– CPU Utilization
• Interactions
– E.g. interaction between linux and java
– Contrasts identify components with issues
– Small changes with high statistical significance
17. Takeaway
Watch out for monocultures
A|B Testing – it’s not just for personalization
http://perfcap.blogspot.com
http://slideshare.net/adrianco – Netflix
http://slideshare.net/adriancockcroft - Battery
http://www.linkedin.com/in/adriancockcroft
@adrianco @BatteryVentures