Your SlideShare is downloading. ×
2012 re:Invent Netflix: embracing the cloud final
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

2012 re:Invent Netflix: embracing the cloud final

414
views

Published on


0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
414
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
18
Comments
0
Likes
3
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • Make clear it’s still tentative, not a committed project – longer term…
  • Transcript

    • 1. Netflix: Embracing the CloudNeil Hunt, CPO / Yury Izrailevsky, VP Engineering
    • 2. Netflix – Service Unavailable – Database CrashedRest assured that the right peopleare losing sleep to fix this problem!We expect to resume service in approximately 72h12 Aug 2008 03:12am
    • 3. Availability 4 x nines Scale Performance Unconstrained Unlimitedhorizontal scaling compute
    • 4. • Experimented with both• Ended up with NoSQL for almost everything important
    • 5. Transitional Infrastructure: “Roman Riding”
    • 6. Phase Components Data & PrerequisitesTrial (2009) Streaming Player Content keys (RO) Membership status (RO)Development Member product Content catalog (RW)(2010-11) pages and APIs Personalization data (RW) & recs algorithms AB Test data (RW)Followthrough Account and Membership data (RW)(2011-12) membershipFinal (2013) Payments PCI and SOX data
    • 7. Availability 4 x nines Scale Performance Unconstrained Unlimitedhorizontal scaling compute
    • 8. Scalability Performance Availability
    • 9. Scalability Performance Availability
    • 10. 1/4/2009 2/4/2009 3/4/2009 4/4/2009 5/4/2009 6/4/2009 7/4/2009 8/4/2009 9/4/2009 10/4/2009 11/4/2009 12/4/2009 1/4/2010 2/4/2010 3/4/2010 4/4/2010 5/4/2010 6/4/2010 7/4/2010 8/4/2010 9/4/2010 10/4/2010 11/4/2010 12/4/2010 1/4/2011 2/4/2011 3/4/2011 4/4/2011 5/4/2011 6/4/2011 7/4/2011 8/4/2011 9/4/2011 10/4/2011 11/4/2011 12/4/2011 1/4/2012 2/4/2012 3/4/2012 4/4/2012 5/4/2012 6/4/2012 7/4/2012 8/4/2012 Scaling Netflix Streaming Service: Weekly Streaming Starts23
    • 11. Netflix Cross-Regional Cloud Architecture
    • 12. Goal: Regional Failover
    • 13. Building Global Netflix Streaming Product
    • 14. Scalability Performance Availability
    • 15. Weekly Cloud Cost Per Streaming Start (last 12 months) 28
    • 16. Simian Army: Cloud Efficiency Automation Janitor Monkey  Regularly scrape unused capacity  Clean up instances, ASGs, ELBs, SGs, etc. Efficiency Monkey  AI-based resource under-usage detection (CPU, memory, etc.) Automated Deletion of Old Data  TTL for S3 (using ObjectExpiration) 29
    • 17. Cyclical Streaming Usage Pattern 30
    • 18. Load-Based Auto Scaling 50%+ Cost Saving Scale up/down by 70%+ Move to Load-Based Scaling 31 31
    • 19. Scalability Performance Availability
    • 20. A Truly Great Service… Has To Just Work! Availability Goal: 99.99% (30 secs/week at peak traffic) 33
    • 21. 7/17/2011 7/24/2011 7/31/2011 8/7/2011 8/14/2011 8/21/2011 8/28/2011 9/4/2011 9/11/2011 9/18/2011 9/25/2011 10/2/2011 10/9/201110/16/201110/23/201110/30/2011 11/6/201111/13/201111/20/201111/27/2011 12/4/201112/11/201112/18/201112/25/2011 1/1/2012 1/8/2012 1/15/2012 1/22/2012 1/29/2012 2/5/2012 2/12/2012 2/19/2012 2/26/2012 3/4/2012 3/11/2012 3/18/2012 3/25/2012 4/1/2012 4/8/2012 4/15/2012 4/22/2012 Other AWS Outages 4/29/2012 5/6/2012 5/13/2012 5/20/2012 5/27/2012 6/3/2012 6/10/2012 6/17/2012 6/24/2012 7/1/2012 Historical Streaming Availability (13wkMA) 7/8/2012 Outage 7/15/2012 7/22/2012 7/29/2012 8/5/2012 8/12/2012 AWS / Netflix 8/19/2012 8/26/2012 June 29th, 2012 9/2/2012 9/9/2012 9/16/2012 9/23/2012 9/30/2012 10/7/2012 14-Oct10/21/201210/28/2012 Using Redundancy in AWS Infrastructure to Survive Failures 11/4/201211/11/2012
    • 22. Cascading Failures API Instant Queue SimpleDB 35
    • 23. Netflix Cloud Architecture 36
    • 24. Cascading Failures X …99% Availability 99% Availability 99% Availability 300 99% = 4.90% 37
    • 25. Strategies to Improve Availability Graceful Degradation Redundancy 38
    • 26. Graceful Degradation 39
    • 27. Redundancy A B C Zone Zone Zone Cassandra A B C S3 Backup Redundancy Across Availability Secure Cloud Zones Backup Storage Redundancy Across 40 Regions, Vendors
    • 28. Testing Fault Tolerance: Simian Army Chaos Monkey Latency Monkey Chaos Gorilla 4
    • 29. Open Source Portal at http://netflix.github.com
    • 30. Superstorm Sandy AWS Infrastructure Held Up >2x Netflix Streaming Usage in East Coast Markets  Boston  New York  Philadelphia  Baltimore  D.C.
    • 31. Focus on Building a Great Streaming Product 44
    • 32. Netflix at 2012 re:InventDate/Time Presenter TopicWed 8:30-10:00 Reed Hastings Keynote with Andy JassyWed 1:00-1:45 Coburn Watson Optimizing Costs with AWSWed 2:05-2:55 Kevin McEntee Netflix’s Transcoding TransformationWed 3:25-4:15 Neil Hunt / Yury I. Netflix: Embracing the CloudWed 4:30-5:20 Adrian Cockcroft High Availability Architecture at NetflixThu 10:30-11:20 Jeremy Edberg Rainmakers – Operating CloudsThu 11:35-12:25 Kurt Brown Data Science with Elastic Map Reduce (EMR)Thu 11:35-12:25 Jason Chan Security Panel: Learn from CISOs working with AWSThu 3:00-3:50 Adrian Cockcroft Compute & Networking Masters Customer PanelThu 3:00-3:50 Ruslan M./Gregg U. Optimizing Your Cassandra Database on AWSThu 4:05-4:55 Ariel Tseitlin Intro to Chaos Monkey and the Simian Army
    • 33. We are sincerely eager to hear your feedback on thispresentation and on re:Invent. Please fill out an evaluation form when you have a chance.
    • 34. We are sincerely eager to hear your feedback on thispresentation and on re:Invent. Please fill out an evaluation form when you have a chance.