Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Scaling the Netflix API - OSCON

6,880 views

Published on

The term "scale" for engineering often is used to discuss systems and their ability to grow with the needs of its users. This is clearly an important aspect of scaling, but there are many other areas in which an engineering organization needs to scale to be successful in the long term. This presentation discusses some of those other areas and details how Netflix (and specifically the API team) addresses them.

Scaling the Netflix API - OSCON

  1. 1. Scaling the Netflix API Daniel Jacobson @daniel_jacobson http://www.linkedin.com/in/danieljacobson http://www.slideshare.net/danieljacobson
  2. 2. Please read the notes associated with each slide for the full context of the presentation
  3. 3. What do I mean by “scale”?
  4. 4. But There Are Many Ways to Scale! Organization Systems Devices Development Testing
  5. 5. But first, some background…
  6. 6. Global Streaming Video for TV Shows and Movies
  7. 7. More than 36 Million Subscribers More than 40 Countries
  8. 8. Netflix Accounts for 33% of Peak Internet Traffic in North America Netflix subscribers are watching more than 1 billion hours a month
  9. 9. Netflix REST API: One-Size-Fits-All (OSFA) Solution
  10. 10. Image courtesy of Jay Mac 3 on Flickr
  11. 11. Netflix API Requests by Audience At Launch In 2008 External Developers
  12. 12. Image courtesy of Jay Mac 3 on Flickr
  13. 13. Netflix API Requests by Audience From 2011 External Developers
  14. 14. Scaling… Organization Systems Devices Development Testing
  15. 15. Distributed Architecture
  16. 16. 1000+ Device Types
  17. 17. Personaliz ation Engine User Info Movie Metadata Movie Ratings Similar Movies Reviews A/B Test Engine Dozens of Dependencies
  18. 18. Personaliz ation Engine User Info Movie Metadata Movie Ratings Similar Movies API Reviews A/B Test Engine
  19. 19. Personaliz ation Engine User Info Movie Metadata Movie Ratings Similar Movies API Reviews A/B Test Engine
  20. 20. http://www.slideshare.net/reed2001/culture-1798664
  21. 21. Scaling… Organization Systems Devices Development Testing
  22. 22. System Resiliency
  23. 23. Distributed Architecture
  24. 24. Dependency Relationships
  25. 25. 2,000,000,000 Requests Per Day to the Netflix API
  26. 26. 30 Distinct, Direct Dependent Services for the Netflix API
  27. 27. 14,000,000,000 Netflix API Calls Per Day to those Dependent Services
  28. 28. 0 Dependent Services with 100% SLA
  29. 29. 99.99% = 99.7%30 0.3% of 2B = 6M failures per day 2+ Hours of Downtime Per Month
  30. 30. 99.99% = 99.7%30 0.3% of 2B = 6M failures per day 2+ Hours of Downtime Per Month
  31. 31. 99.9% = 97%30 3% of 2B = 60M failures per day 20+ Hours of Downtime Per Month
  32. 32. Personaliz ation Engine User Info Movie Metadata Movie Ratings Similar Movies API Reviews A/B Test Engine
  33. 33. Personaliz ation Engine User Info Movie Metadata Movie Ratings Similar Movies API Reviews A/B Test Engine
  34. 34. Personaliz ation Engine User Info Movie Metadata Movie Ratings Similar Movies API Reviews A/B Test Engine
  35. 35. Personaliz ation Engine User Info Movie Metadata Movie Ratings Similar Movies API Reviews A/B Test Engine
  36. 36. Personaliz ation Engine User Info Movie Metadata Movie Ratings Similar Movies API Reviews A/B Test Engine
  37. 37. Circuit Breaker Dashboard
  38. 38. Call Volume and Health / Last 10 Seconds
  39. 39. Call Volume / Last 2 Minutes
  40. 40. Successful Requests
  41. 41. Successful, But Slower Than Expected
  42. 42. Short-Circuited Requests, Delivering Fallbacks
  43. 43. Timeouts, Delivering Fallbacks
  44. 44. Thread Pool & Task Queue Full, Delivering Fallbacks
  45. 45. Exceptions, Delivering Fallbacks
  46. 46. Error Rate # + # + # + # / (# + # + # + # + #) = Error Rate
  47. 47. Status of Fallback Circuit
  48. 48. Requests per Second, Over Last 10 Seconds
  49. 49. SLA Information
  50. 50. Personaliz ation Engine User Info Movie Metadata Movie Ratings Similar Movies API Reviews A/B Test Engine
  51. 51. Personaliz ation Engine User Info Movie Metadata Movie Ratings Similar Movies API Reviews A/B Test Engine
  52. 52. Personaliz ation Engine User Info Movie Metadata Movie Ratings Similar Movies API Reviews A/B Test Engine
  53. 53. Personaliz ation Engine User Info Movie Metadata Movie Ratings Similar Movies API Reviews A/B Test Engine Fallback
  54. 54. Personaliz ation Engine User Info Movie Metadata Movie Ratings Similar Movies API Reviews A/B Test Engine Fallback
  55. 55. System Infrastructure
  56. 56. AWS Cloud
  57. 57. Autoscaling
  58. 58. Autoscaling
  59. 59. More than 36 Million Subscribers More than 40 Countries
  60. 60. Zuul Gatekeeper for the Netflix Streaming Application
  61. 61. Zuul • Multi-Region Resiliency • Insights • Stress Testing • Canary Testing • Dynamic Routing • Load Shedding • Security • Static Response Handling • Authentication
  62. 62. Isthmus
  63. 63. Forced Failure
  64. 64. Scaling… Organization Systems Devices Development Testing
  65. 65. Screen Real Estate
  66. 66. Controller
  67. 67. Technical Capabilities
  68. 68. One-Size-Fits-All API Request Request Request
  69. 69. Scaling… Organization Systems Devices Development Testing
  70. 70. Courtesy of South Florida Classical Review
  71. 71. Resource-Based API vs. Experience-Based API
  72. 72. Resource-Based Requests • /users/<id>/ratings/title • /users/<id>/queues • /users/<id>/queues/instant • /users/<id>/recommendations • /catalog/titles/movie • /catalog/titles/series • /catalog/people
  73. 73. REST API RECOMME NDATIONS MOVIE DATA SIMILAR MOVIES AUTH MEMBER DATA A/B TESTS START- UP RATINGS Network Border Network Border
  74. 74. RECOMME NDATIONS MOVIE DATA SIMILAR MOVIES AUTH MEMBER DATA A/B TESTS START- UP RATINGS OSFA API Network Border Network Border SERVER CODE CLIENT CODE
  75. 75. RECOMME NDATIONS MOVIE DATA SIMILAR MOVIES AUTH MEMBER DATA A/B TESTS START- UP RATINGS OSFA API Network Border Network Border DATA GATHERING, FORMATTING, AND DELIVERY USER INTERFACE RENDERING
  76. 76. Experience-Based Requests • /ps3/homescreen
  77. 77. JAVA API Network Border Network Border RECOMME NDATIONS MOVIE DATA SIMILAR MOVIES AUTH MEMBER DATA A/B TESTS START- UP RATINGS Groovy Layer
  78. 78. RECOMME NDATIONSA ZXSXX C CCC MOVIE DATA SIMILAR MOVIES AUTH MEMBER DATA A/B TESTS START- UP RATINGS JAVA API SERVER CODE CLIENT CODE CLIENT ADAPTER CODE (WRITTEN BY CLIENT TEAMS, DYNAMICALLY UPLOADED TO SERVER) Network Border Network Border
  79. 79. RECOMME NDATIONSA ZXSXX C CCC MOVIE DATA SIMILAR MOVIES AUTH MEMBER DATA A/B TESTS START- UP RATINGS JAVA API DATA GATHERING DATA FORMATTING AND DELIVERY USER INTERFACE RENDERING Network Border Network Border
  80. 80. Scaling… Organization Systems Devices Development Testing
  81. 81. Dependency Relationships
  82. 82. Testing Philosophy: Act Fast, React Fast
  83. 83. That Doesn’t Mean We Don’t Test • Unit tests • Functional tests • Regression scripts • Continuous integration • Capacity planning • Load / Performance tests
  84. 84. Cloud-Based Deployment Techniques
  85. 85. Current Code In Production API Requests from the Internet
  86. 86. Canary Analysis Automation
  87. 87. Single Canary Instance To Test New Code with Production Traffic (around 1% or less of traffic) Current Code In Production API Requests from the Internet Error!
  88. 88. Current Code In Production API Requests from the Internet
  89. 89. Current Code In Production API Requests from the Internet Perfect!
  90. 90. Current Code In Production API Requests from the Internet Perfect!
  91. 91. Stress Test with Zuul
  92. 92. Current Code In Production API Requests from the Internet New Code Getting Prepared for Production
  93. 93. Current Code In Production API Requests from the Internet New Code Getting Prepared for Production
  94. 94. Error! Current Code In Production API Requests from the Internet New Code Getting Prepared for Production
  95. 95. Current Code In Production API Requests from the Internet New Code Getting Prepared for Production
  96. 96. Current Code In Production API Requests from the Internet Perfect!
  97. 97. Stress Test with Zuul
  98. 98. Current Code In Production API Requests from the Internet New Code Getting Prepared for Production
  99. 99. Current Code In Production API Requests from the Internet New Code Getting Prepared for Production
  100. 100. API Requests from the Internet New Code Getting Prepared for Production
  101. 101. https://www.github.com/Netflix
  102. 102. Scaling the Netflix API Daniel Jacobson @daniel_jacobson http://www.linkedin.com/in/danieljacobson http://www.slideshare.net/danieljacobson

×