Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

2012 re:Invent Netflix: embracing the cloud final

956 views

Published on

  • Be the first to comment

2012 re:Invent Netflix: embracing the cloud final

  1. 1. Netflix: Embracing the CloudNeil Hunt, CPO / Yury Izrailevsky, VP Engineering
  2. 2. Netflix – Service Unavailable – Database CrashedRest assured that the right peopleare losing sleep to fix this problem!We expect to resume service in approximately 72h12 Aug 2008 03:12am
  3. 3. Availability 4 x nines Scale Performance Unconstrained Unlimitedhorizontal scaling compute
  4. 4. • Experimented with both• Ended up with NoSQL for almost everything important
  5. 5. Transitional Infrastructure: “Roman Riding”
  6. 6. Phase Components Data & PrerequisitesTrial (2009) Streaming Player Content keys (RO) Membership status (RO)Development Member product Content catalog (RW)(2010-11) pages and APIs Personalization data (RW) & recs algorithms AB Test data (RW)Followthrough Account and Membership data (RW)(2011-12) membershipFinal (2013) Payments PCI and SOX data
  7. 7. Availability 4 x nines Scale Performance Unconstrained Unlimitedhorizontal scaling compute
  8. 8. Scalability Performance Availability
  9. 9. Scalability Performance Availability
  10. 10. 1/4/2009 2/4/2009 3/4/2009 4/4/2009 5/4/2009 6/4/2009 7/4/2009 8/4/2009 9/4/2009 10/4/2009 11/4/2009 12/4/2009 1/4/2010 2/4/2010 3/4/2010 4/4/2010 5/4/2010 6/4/2010 7/4/2010 8/4/2010 9/4/2010 10/4/2010 11/4/2010 12/4/2010 1/4/2011 2/4/2011 3/4/2011 4/4/2011 5/4/2011 6/4/2011 7/4/2011 8/4/2011 9/4/2011 10/4/2011 11/4/2011 12/4/2011 1/4/2012 2/4/2012 3/4/2012 4/4/2012 5/4/2012 6/4/2012 7/4/2012 8/4/2012 Scaling Netflix Streaming Service: Weekly Streaming Starts23
  11. 11. Netflix Cross-Regional Cloud Architecture
  12. 12. Goal: Regional Failover
  13. 13. Building Global Netflix Streaming Product
  14. 14. Scalability Performance Availability
  15. 15. Weekly Cloud Cost Per Streaming Start (last 12 months) 28
  16. 16. Simian Army: Cloud Efficiency Automation Janitor Monkey  Regularly scrape unused capacity  Clean up instances, ASGs, ELBs, SGs, etc. Efficiency Monkey  AI-based resource under-usage detection (CPU, memory, etc.) Automated Deletion of Old Data  TTL for S3 (using ObjectExpiration) 29
  17. 17. Cyclical Streaming Usage Pattern 30
  18. 18. Load-Based Auto Scaling 50%+ Cost Saving Scale up/down by 70%+ Move to Load-Based Scaling 31 31
  19. 19. Scalability Performance Availability
  20. 20. A Truly Great Service… Has To Just Work! Availability Goal: 99.99% (30 secs/week at peak traffic) 33
  21. 21. 7/17/2011 7/24/2011 7/31/2011 8/7/2011 8/14/2011 8/21/2011 8/28/2011 9/4/2011 9/11/2011 9/18/2011 9/25/2011 10/2/2011 10/9/201110/16/201110/23/201110/30/2011 11/6/201111/13/201111/20/201111/27/2011 12/4/201112/11/201112/18/201112/25/2011 1/1/2012 1/8/2012 1/15/2012 1/22/2012 1/29/2012 2/5/2012 2/12/2012 2/19/2012 2/26/2012 3/4/2012 3/11/2012 3/18/2012 3/25/2012 4/1/2012 4/8/2012 4/15/2012 4/22/2012 Other AWS Outages 4/29/2012 5/6/2012 5/13/2012 5/20/2012 5/27/2012 6/3/2012 6/10/2012 6/17/2012 6/24/2012 7/1/2012 Historical Streaming Availability (13wkMA) 7/8/2012 Outage 7/15/2012 7/22/2012 7/29/2012 8/5/2012 8/12/2012 AWS / Netflix 8/19/2012 8/26/2012 June 29th, 2012 9/2/2012 9/9/2012 9/16/2012 9/23/2012 9/30/2012 10/7/2012 14-Oct10/21/201210/28/2012 Using Redundancy in AWS Infrastructure to Survive Failures 11/4/201211/11/2012
  22. 22. Cascading Failures API Instant Queue SimpleDB 35
  23. 23. Netflix Cloud Architecture 36
  24. 24. Cascading Failures X …99% Availability 99% Availability 99% Availability 300 99% = 4.90% 37
  25. 25. Strategies to Improve Availability Graceful Degradation Redundancy 38
  26. 26. Graceful Degradation 39
  27. 27. Redundancy A B C Zone Zone Zone Cassandra A B C S3 Backup Redundancy Across Availability Secure Cloud Zones Backup Storage Redundancy Across 40 Regions, Vendors
  28. 28. Testing Fault Tolerance: Simian Army Chaos Monkey Latency Monkey Chaos Gorilla 4
  29. 29. Open Source Portal at http://netflix.github.com
  30. 30. Superstorm Sandy AWS Infrastructure Held Up >2x Netflix Streaming Usage in East Coast Markets  Boston  New York  Philadelphia  Baltimore  D.C.
  31. 31. Focus on Building a Great Streaming Product 44
  32. 32. Netflix at 2012 re:InventDate/Time Presenter TopicWed 8:30-10:00 Reed Hastings Keynote with Andy JassyWed 1:00-1:45 Coburn Watson Optimizing Costs with AWSWed 2:05-2:55 Kevin McEntee Netflix’s Transcoding TransformationWed 3:25-4:15 Neil Hunt / Yury I. Netflix: Embracing the CloudWed 4:30-5:20 Adrian Cockcroft High Availability Architecture at NetflixThu 10:30-11:20 Jeremy Edberg Rainmakers – Operating CloudsThu 11:35-12:25 Kurt Brown Data Science with Elastic Map Reduce (EMR)Thu 11:35-12:25 Jason Chan Security Panel: Learn from CISOs working with AWSThu 3:00-3:50 Adrian Cockcroft Compute & Networking Masters Customer PanelThu 3:00-3:50 Ruslan M./Gregg U. Optimizing Your Cassandra Database on AWSThu 4:05-4:55 Ariel Tseitlin Intro to Chaos Monkey and the Simian Army
  33. 33. We are sincerely eager to hear your feedback on thispresentation and on re:Invent. Please fill out an evaluation form when you have a chance.
  34. 34. We are sincerely eager to hear your feedback on thispresentation and on re:Invent. Please fill out an evaluation form when you have a chance.

×