Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Evolving the Netflix API

2,580 views

Published on

At Netflix, we provide an API that supports the content discovery, sign-up, and playback experience on thousands of device types that millions use around the world every day. As our user base and traffic has grown by leaps and bounds, we are continuously evolving this API to be flexible, scalable, and resilient and enable the best experience for our users. In this talk, I gave an overview of how and why the Netflix API has evolved to where it is today and how we make it resilient against failures while keeping it flexible and nimble enough to support continuous A/B testing.

Published in: Technology

Evolving the Netflix API

  1. 1. Evolving the Netflix API Katharina Probst Engineering Manager, API October 2015
  2. 2. What is Netflix?
  3. 3. > 1000 Devices
  4. 4. Is it significant? ❏ Peak downstream traffic in the US is almost 35%. ❏ Almost 70 Million subscribers worldwide and growing Source: http://www.sandvine.com/news/global_broadband_trends.asp
  5. 5. We’re going global!
  6. 6. Source: https://help.netflix.com/en/node/14164 Recent additions: Spain, Portugal, Italy Current availability
  7. 7. Netflix Originals
  8. 8. Do we need a Netflix API?
  9. 9. API Personali- zation Engine User Info Ratings Similar Movies A/B Test Engine ….
  10. 10. Uses ❏ Discovery ❏ Signup ❏ Playback ❏ Internal teams only API
  11. 11. Goals ❏ Flexibility ❏ Resiliency ❏ Scalability ❏ Excellent tools API
  12. 12. Goals ❏ Flexibility ❏ Resiliency ❏ Scalability ❏ Excellent tools API
  13. 13. Lots of devices, lots of variety
  14. 14. Different interaction models
  15. 15. And just to make things a little more interesting…. ❏ A/B tests ❏ profiles ❏ localization
  16. 16. What we felt we had What we needed
  17. 17. ❏ Reduce network chattiness ❏ Support device optimizations ❏ Enable faster development for internal users
  18. 18. Local MethodRemote API GET /users/{user_id}/lists apiGateway .getLists(userId)
  19. 19. Discrete HTTP requests pay network tax repeatedly
  20. 20. Single, optimized request; pay network tax once
  21. 21. Single, optimized request; pay network tax once Client data assembly logic pushed to server
  22. 22. Add server-side scripting capability ❏ Enable independent development & device optimization ❏ Profit
  23. 23. ❏ UI (script) changes can happen independently ❏ Script changes can be pushed to running servers, so decoupled from API push schedule ❏ Server+UI changes usually involve API team Impact on velocity and collaboration
  24. 24. RxJava Hystrix JavaServiceLayer Mid-tier Services UI Teams Client Server Internet Application /tv/home API Team Service Teams
  25. 25. ELB Zuul Mid-tier Services Scriptable Backend Scriptable Backend + API Layer
  26. 26. Goals ❏ Flexibility ❏ Resiliency ❏ Scalability ❏ Excellent tools API
  27. 27. https://github.com/Netflix/Hystrix resilience patterns for distributed sys
  28. 28. Hystrix Primer ❏ Protection from and control over latency and failure from dependencies ❏ Stop cascading failures in a complex distributed system ❏ Fail fast and rapidly recover ❏ Fall back and gracefully degrade
  29. 29. Personalization Engine Similar Movies Movie Metadata Ratings User Info Instant Queue A/B Test Engine API
  30. 30. Personalization Engine Similar movies Movie Metadata Ratings User Info Instant Queue A/B Test Engine API
  31. 31. API Personalization Engine Similar movies Movie Metadata Ratings User Info Instant Queue A/B Test Engine Beware Cascading Failure!
  32. 32. Personalization Engine Similar Movies Movie Metadata Ratings User Info Instant Queue A/B Test Engine API
  33. 33. Personalization Engine Similar Movies Movie Metadata Ratings User Info Instant Queue A/B Test Engine Fallback Response Local Fallback Avoids Cascading Failure! API
  34. 34. Personalization Engine Similar Movies Movie Metadata Ratings User Info Instant Queue A/B Test Engine Fallback Response Use FIT to test such failures API
  35. 35. Goals ❏ Flexibility ❏ Resiliency ❏ Scalability ❏ Excellent tools API
  36. 36. Autoscaling & Capacity Management http://nflx.it/1LvqLUi
  37. 37. AWS Controls Reactive, does not scale up fast enough
  38. 38. Fine-grained Control with Scryer Complements AWS Controls ❏ Faster scale-up, improved cost ❏ Use reactive policy for organic scale down
  39. 39. Goals ❏ Flexibility ❏ Resiliency ❏ Scalability ❏ Excellent tools API
  40. 40. Run 1% of your traffic on the new code and see how it does
  41. 41. ❏ Errors: 2xx, 4xx, 5xx ❏ latency ❏ network ❏ busy threads ❏ load ❏ ... So you’ve run a canary. Now what? Control Canary
  42. 42. Successful canary red/black push
  43. 43. Continuous Delivery http://techblog.netflix.com/2015/09/moving-from-asgard-to-spinnaker.html
  44. 44. Quickly see status of all clusters http://techblog.netflix.com/2015/09/moving-from-asgard-to-spinnaker.html
  45. 45. Script Management
  46. 46. Deployment & Ops
  47. 47. Deployment & Ops
  48. 48. Deployment & Ops
  49. 49. Real-time analysis http://www.slideshare.net/g9yuayon/qcon-talk-on-netflix-mantis-a-stream-processing-system Submit a query, see requests in real time.
  50. 50. Looking ahead - current challenges ❏ Breaking up the monolith ❏ Script isolation ❏ Thin client libraries ❏ New interaction models
  51. 51. Looking ahead Source: http://techcrunch.com/2014/03/08/success-reality-and-the-myth-of-up-and-to-the-right/
  52. 52. Looking ahead ❏ Breaking up the monolith ❏ Script isolation ❏ Thin client libraries ❏ New interaction models
  53. 53. ● > 900 active endpoints ● ~ 30 client libraries ● 78 thread pools ● high memory usage Breaking up the monolith
  54. 54. Script isolation & node ❏ Groovy scripts run as part of API process ❏ UI teams would like to use other languages (in particular node.js) API remote service layer Service client libraries UI/device scripts (node) Falcor var response = model.get("todos[0..2] ['name','done']");
  55. 55. Thin client libraries ❏ Many client libraries contain a lot of business logic and have a lot of dependencies ❏ Move business logic and dependencies to server API remote service layer Service client libraries UI/device scripts (node) Falcor
  56. 56. Looking ahead ❏ Breaking up the monolith ❏ Script isolation ❏ Thin client libraries ❏ New interaction models
  57. 57. New interaction models ❏ request/response ❏ request/stream ❏ fire-and-forget ❏ event subscription ❏ channel API remote service layer Service client libraries UI/device scripts (node) Falcor http://reactivesocket.io
  58. 58. In the beginning...

×