Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Netflix: A State of Xen - Chaos Monkey & Cassandra

6,395 views

Published on

A deep look at how Netflix operates its Cassandra fleet and how we survived the 2014 AWS RE:Boot

Published in: Technology

Netflix: A State of Xen - Chaos Monkey & Cassandra

  1. 1. A State of Xen Chaos Monkey & Cassandra
  2. 2. Who we are Jean-Sebastien Jeannotte – JS Senior Software Engineer Platform Automation Engineering jjeannotte@netflix.com @jsjeannotte http://www.linkedin.com/in/jsjeannotte Nir Alfasi Senior Software Engineer Platform Automation Engineering alfasi@netflix.com @niralfasi http://www.linkedin.com/in/alfasin Christos Kalantzis Director of Engineering Cloud Database Engineering Cassandra MVP ckalantzis@netflix.com @chriskalan http://www.linkedin.com/in/christoskalantzis
  3. 3. AWS Boot re: September 2014, Every AZ
  4. 4. Our stack during Re:boot 2014 C* Priam C* Priam C* Priam REST + SSH
  5. 5. Our stack during Re:boot 2014
  6. 6. Our stack during Re:boot 2014
  7. 7. Our stack during Re:boot 2014 C* Priam C* Priam C* Priam REST + SSH Atlas Atlas App 1 App 2
  8. 8. Our stack during Re:boot 2014
  9. 9. Our stack during Re:boot 2014 Disappearing   instance?   Launch  new   instance   All  good   Is  the  C*  ring   healthy?   Yes   Are  all  instances   healthy?   Yes   All  good   Can  we  fix   automa>cally?   Replace  bad   instance   All  good   Is  there  an  offline   maintenance?   First  failure?   Sleep  for  X   minutes  and   retry   PagerDuty  No   Is  there  an  offline   maintenance?   First  failure?   All  good   Every 30 min
  10. 10. Our stack during Re:boot 2014 AWS Boot re: September 2014, Every AZ
  11. 11. Gaps we identified
  12. 12. Gaps we identified
  13. 13. Gaps we identified
  14. 14. Gaps we identified
  15. 15. New direction
  16. 16. New direction – What others are doing
  17. 17. New direction – What we decided to do
  18. 18. New direction – What we decided to do
  19. 19. New direction – What we decided to do C* Priam C* Priam C* Priam Atlas Atlas App 1 App 2
  20. 20. New direction – What we learned (principles)
  21. 21. New direction – What we learned (principles)
  22. 22. New direction – What we learned (principles) Synchronous   Asynchronous   SSH   HTTP  /  REST  
  23. 23. New direction – What we learned (principles)
  24. 24. New direction – What we learned (principles)
  25. 25. What does the future look like?
  26. 26. What does the future look like?
  27. 27. What does the future look like?
  28. 28. Check out our https://jobs.netflix.com page for current openings
  29. 29. Who we are Jean-Sebastien Jeannotte – JS Senior Software Engineer Platform Automation Engineering jjeannotte@netflix.com @jsjeannotte http://www.linkedin.com/in/jsjeannotte Nir Alfasi Senior Software Engineer Platform Automation Engineering alfasi@netflix.com @niralfasi http://www.linkedin.com/in/alfasin Christos Kalantzis Director of Engineering Cloud Database Engineering Cassandra MVP ckalantzis@netflix.com @chriskalan http://www.linkedin.com/in/christoskalantzis

×