Evolution of the Netflix API

1. Evolution of the Netflix API QCon San Francisco - November 2013 Ben Christensen Software Engineer – Edge & Playback Services at Netﬂix @benjchristensen ! ! ! ! http://techblog.netﬂix.com/

2. Watch the video with slide synchronization on InfoQ.com! http://www.infoq.com/presentations /netflix-api-evolution InfoQ.com: News & Community Site • 750,000 unique visitors/month • Published in 4 languages (English, Chinese, Japanese and Brazilian Portuguese) • Post content from our QCon conferences • News 15-20 / week • Articles 3-4 / week • Presentations (videos) 12-15 / week • Interviews 2-3 / week • Books 1 / month

3. Presented at QCon San Francisco www.qconsf.com Purpose of QCon - to empower software development by facilitating the spread of knowledge and innovation Strategy - practitioner-driven conference designed for YOU: influencers of change and innovation in your teams - speakers and topics driving the evolution and innovation - connecting and catalyzing the influencers and innovators Highlights - attended by more than 12,000 delegates since 2007 - held in 9 cities worldwide

5. More than 40 million Subscribers in 50+ Countries and Territories

6. Netflix accounts for 33% of Peak Downstream Internet Traffic in North America Netflix subscribers are watching more than 1 billion hours a month

7. API traffic has grown from ~20 million/day in 2010 to >2 billion/day millions of API requests per day 2000 1500 1000 500 0 2010 2011 2012 Today

8. Discovery Streaming

9. Netﬂix API Streaming

11. At the start … millions of API requests per day 2000 1500 1000 500 0 2008 2009 2010 2011 2012 Today

12. 2008 API Launch Targeted 100% at External Developers Open API Netﬂix Devices

13. 2008 API Launch Targeted 100% at External Developers Purpose A “Thousand Flowers” of 3rd party innovation ! Audience External Developers

14. Pre-Cloud Architecture API Service Service Service Oracle Service

15. In 2011 … millions of API requests per day 2000 1500 1000 500 0 2008 2009 2010 2011 2012 Today

16. 99.9% Netflix Devices Open API Netﬂix Devices

17. Targeted at Internal Developers Open API Netﬂix Devices

18. Targeted at Internal Developers Purpose Enable Netflix Experience ! Audience Netflix Device and UI Teams

19. Scale & Resilience ! Performance & Innovation

21. Netﬂix API Dependency A Dependency B Dependency D Dependency C Dependency E Dependency G Dependency F Dependency H Dependency J Dependency M Dependency P Dependency I Dependency K Dependency N Dependency Q Dependency L Dependency O Dependency R

27. AWS Availability Zone AWS Availability Zone AWS Availability Zone

32. User Request Dependency A Dependency B Dependency D Dependency C Dependency E Dependency G Dependency F Dependency H Dependency J Dependency M Dependency P Dependency I Dependency K Dependency N Dependency Q Dependency L Dependency O Dependency R

33. User Request Dependency A Dependency B Dependency D Dependency C Dependency E Dependency G Dependency F Dependency H Dependency J Dependency M Dependency P Dependency I Dependency K Dependency N Dependency Q Dependency L Dependency O Dependency R User request blocked by latency in single network call

34. Dependency A Dependency B Dependency D Dependency C Dependency E Dependency G Dependency F Dependency H Dependency J Dependency M Dependency P Dependency I Dependency K Dependency N Dependency Q Dependency L Dependency O Dependency R User Request User Request User Request User Request User Request User Request User Request ............................................. At high volume all request threads can block in seconds

35. ! Dozens of dependencies. ! One going bad takes everything down. ! 30 99.99% = 99.7% uptime ! 0.3% of 1 billion = 3,000,000 failures ! 2+ hours downtime/month ! ! ! Reality is generally worse.

36. CONSTRAINTS Speed of Iteration ! Client Libraries ! Mixed Environment

40. Dependency A Dependency B Dependency D Dependency C Dependency E Dependency G Dependency F Dependency H Dependency J Dependency M Dependency P Dependency I Dependency K Dependency N Dependency Q Dependency L Dependency O Dependency R User Request User Request User Request User Request User Request User Request User Request .............................................

41. User Request User Request User Request Logic - argument validation, caches, metrics, logging, multivariate testing, routing, etc Serialization - URL and/or body generation Network Request - TCP/HTTP, latency, 4xx, 5xx, etc ............................................. Deserialization - JSON/XML/Thrift/Protobuf/etc Dependency B cy D Dependency C Dependency E dency G ependency J Dependency M Logic - validation, decoration, object model, caching, metrics, logging, etc Dependency F Dependency H Dependency K Dependency N Dependency I Dependency L Dependency O

42. > 80% of requests rejected Median Latency [Sat Jun 30 04:01:37 2012] [error] proxy: HTTP: disabled connection for (127.0.0.1) "Timeout guard" daemon prio=10 tid=0x00002aaacd5e5000 nid=0x3aac runnable [0x00002aaac388f000] java.lang.Thread.State: RUNNABLE! at java.net.PlainSocketImpl.socketConnect(Native Method)! at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)! - locked <0x000000055c7e8bd8> (a java.net.SocksSocketImpl)! at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)! at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)! at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:391)! at java.net.Socket.connect(Socket.java:579)! at java.net.Socket.connect(Socket.java:528)! at java.net.Socket.(Socket.java:425)! at java.net.Socket.(Socket.java:280)! at org.apache.commons.httpclient.protocol.DefaultProtocolSocketFactory.createSocket(DefaultProtocolSocketFactory.java:80)! at org.apache.commons.httpclient.protocol.ControllerThreadSocketFactory$1.doit(ControllerThreadSocketFactory.java:91)! at org.apache.commons.httpclient.protocol.ControllerThreadSocketFactory$SocketTask.run(ControllerThreadSocketFactory.java:158) at java.lang.Thread.run(Thread.java:722)

50. circle color and size represent health and trafﬁc volume

51. circle color and size represent health and trafﬁc volume 2 minutes of request rate to show relative changes in trafﬁc

52. circle color and size represent health and trafﬁc volume 2 minutes of request rate to show relative changes in trafﬁc hosts reporting from cluster

53. circle color and size represent health and trafﬁc volume 2 minutes of request rate to show relative changes in trafﬁc hosts reporting from cluster last minute latency percentiles

54. circle color and size represent health and trafﬁc volume 2 minutes of request rate to show relative changes in trafﬁc hosts reporting from cluster Circuit-breaker status last minute latency percentiles

55. circle color and size represent health and trafﬁc volume Request rate 2 minutes of request rate to show relative changes in trafﬁc hosts reporting from cluster Circuit-breaker status last minute latency percentiles

56. circle color and size represent health and trafﬁc volume Error percentage of last 10 seconds Request rate 2 minutes of request rate to show relative changes in trafﬁc hosts reporting from cluster Circuit-breaker status last minute latency percentiles

57. Error percentage of last 10 seconds circle color and size represent health and trafﬁc volume Request rate 2 minutes of request rate to show relative changes in trafﬁc Circuit-breaker status hosts reporting from cluster last minute latency percentiles Rolling 10 second counters with 1 second granularity Successes Short-circuited (rejected) Thread timeouts Thread-pool Rejections Failures/Exceptions

60. Zuul Routing Layer Canary vs Baseline Squeeze "Coalmine" Production

65. User Request Dependency A Dependency B Dependency D Dependency C Dependency E Dependency G Dependency F Dependency H Dependency J Dependency M Dependency P Dependency I Dependency K Dependency N Dependency Q Dependency L Dependency O Dependency R System Relationship Over Network without Bulkhead

76. Predictive + Reactive Auto-Scaling

88. One Size Fits All RESTful API millions of API requests per day 2000 1500 1000 500 0 2010 2011 2012 Today

89. One Size Fits All RESTful API millions of API requests per day 2000 1500 1000 500 0 2010 2011 2012 Today

90. One Size Fits All RESTful API 1000+ Devices millions of API requests per day 2000 1500 1000 500 0 2010 2011 2012 Today

91. We wanted to re-architecture our call patterns ...

92. ... to collapse network trafﬁc into coarse API calls ... nested, conditional, concurrent execution

93. ... and we wanted to allow anybody to create endpoints, not just the “API Team”

95. Concurrency without each engineer reading and re-reading this → ! (awesome book ... everybody isn’t going to - or should have to - read it though, that’s the point)

96. Owner of api should retain control of concurrency behavior.

97. Owner of api should retain control of concurrency behavior. public Data getData(); What if the implementation needs to change from synchronous to asynchronous? ! How should the client execute that method without blocking? spawn a thread?

98. public Data getData(); public void getData(Callback<T> c); ! public Future<T> getData(); ! public Future<List<Future<T>>> getData(); ! ! other options ... ?

99. Reactive Programming with Rx Iterable pull T next() throws Exception returns; Observable push onNext(T) onError(Exception) onCompleted()

100. (Functional) Reactive Programming with RxJava Iterable pull T next() throws Exception returns; Observable push onNext(T) onError(Exception) onCompleted()

103. Iterable pull T next() throws Exception returns; ! // Iterable<String> // that contains 75 Strings getDataFromLocalMemory() .skip(10) .take(5) .map({ s -‐> return s + "_transformed"}) .forEach( { println "next => " + it}) Observable push onNext(T) onError(Exception) onCompleted() ! // Observable<String> // that emits 75 Strings getDataFromNetwork() .skip(10) .take(5) .map({ s -‐> return s + "_transformed"}) .subscribe( { println "onNext => " + it})

104. Iterable pull T next() throws Exception returns; ! // Iterable<String> // that contains 75 Strings getDataFromLocalMemory() .skip(10) .take(5) .map({ s -‐> return s + "_transformed"}) .forEach( { println "onNext => " + it}) Observable push onNext(T) onError(Exception) onCompleted() ! // Observable<String> // that emits 75 Strings getDataFromNetwork() .skip(10) .take(5) .map({ s -‐> return s + "_transformed"}) .subscribe( { println "onNext => " + it})

105. Single Multiple Sync T getData() Iterable<T> getData() Async Future<T> getData() Observable<T> getData()

106. Single Multiple Sync T getData() Iterable<T> getData() Async Future<T> getData() Observable<T> getData() String s = getData(args); if (s.equals(x)) { // do something } else { // do something else }

107. Single Multiple Sync T getData() Iterable<T> getData() Async Future<T> getData() Observable<T> getData() Iterable<String> values = getData(args); for (String s : values) { if (s.equals(x)) { // do something } else { // do something else } }

108. Single Multiple Sync T getData() Iterable<T> getData() Async Future<T> getData() Observable<T> getData() Future<String> s = getData(args); if (s.get().equals(x)) { // do something } else { // do something else }

109. Single Multiple Sync T getData() Iterable<T> getData() Async Future<T> getData() Observable<T> getData() Future<String> s = getData(args); if (s.get().equals(x)) { // do something } else { // do something else }

110. Single Multiple Sync T getData() Iterable<T> getData() Async Future<T> getData() Observable<T> getData() Future<String> s = getData(args); Futures.addCallback(s, new FutureCallback<String> { public void onSuccess(String s) { if (s.equals(x)) { // do something } else { // do something else } } }, executor);

113. Single Multiple Sync T getData() Iterable<T> getData() Async Future<T> getData() Observable<T> getData() CompletableFuture<String> s = getData(args); s.thenApply((v) -> { if (v.equals(x)) { // do something } else { // do something else } });

114. Single Multiple Sync T getData() Iterable<T> getData() Async Future<T> getData() Observable<T> getData() CompletableFuture<String> s = getData(args); s.thenApply((v) -> { if (v.equals(x)) { // do something } else { // do something else } });

115. Single Multiple Sync T getData() Iterable<T> getData() Async Future<T> getData() Observable<T> getData() Future<String> s = getData(args); s.map({ s -> if (s.equals(x)) { // do something } else { // do something else } });

118. Single Multiple Sync T getData() Iterable<T> getData() Async Future<T> getData() Observable<T> getData() Observable<String> s = getData(args); s.map({ s -> if (s.equals(x)) { // do something } else { // do something else } });

121. instead of a blocking api ... class VideoService { def VideoList getPersonalizedListOfMovies(userId); def VideoBookmark getBookmark(userId, videoId); def VideoRating getRating(userId, videoId); def VideoMetadata getMetadata(videoId); } ... create an observable api: class VideoService { def Observable<VideoList> getPersonalizedListOfMovies(userId); def Observable<VideoBookmark> getBookmark(userId, videoId); def Observable<VideoRating> getRating(userId, videoId); def Observable<VideoMetadata> getMetadata(videoId); }

122. RxJava http://github.com/Netﬂix/RxJava “a library for composing asynchronous and event-based programs using observable sequences for the Java VM” A Java port of Rx (Reactive Extensions) https://rx.codeplex.com (.Net and Javascript by Microsoft)

123. client code treats all interactions with the api as asynchronous ! ! the api implementation chooses whether something is blocking or non-blocking and what resources it uses

130. Netﬂix API Server Device Optimize for Each Device. Leverage the Server

131. Code

132. SDK for API Web Service Development

135. api.servletResponse.writer.print("Hello")

136. api.servletResponse.writer.print("Hello") $runScript.py -e PROD sample.groovy Hello ################################################################################################ ####### Script Logger (enabled via debug=true request parameter) ################################################################################################ [0ms] INFO: User => NULL ################################################################################################ ####### Script Logger (enabled via debug=true request parameter) ################################################################################################

142. public class HelloEndpoint extends APIEndpoint { @Override public void execute(APIRequest api) throws Throwable { // get a request parameter from servlet request String alias = api.servletRequest.getParameter("name"); ! // set content type and write something to servlet response api.servletResponse.setContentType("application/json"); api.servletResponse.writer.print( JsonUtility.toJson( [ alias : alias, name : api.user.firstName ] ) ) ! } }

143. public class HelloEndpoint extends APIEndpoint { @Override public void execute(APIRequest api) throws Throwable { // get a request parameter from servlet request String alias = api.servletRequest.getParameter("name"); ! // set content type and write something to servlet response api.servletResponse.setContentType("application/json"); api.servletResponse.writer.print( JsonUtility.toJson( [ alias : alias, name : api.user.firstName ] ) ) ! } } runScript.py -e TEST --userId 1189658154 --args "name=bob" HelloEndpoint.groovy

144. public class HelloEndpoint extends APIEndpoint { @Override public void execute(APIRequest api) throws Throwable { // get a request parameter from servlet request String alias = api.servletRequest.getParameter("name"); ! // set content type and write something to servlet response api.servletResponse.setContentType("application/json"); api.servletResponse.writer.print( JsonUtility.toJson( [ alias : alias, name : api.user.firstName ] ) ) ! } } runScript.py -e TEST --userId 1189658154 --args "name=bob" HelloEndpoint.groovy {"alias":"bob","name":"Old"}

150. public Observable getVideoSummary(APIVideo video) { // get id, title def seed = [id: video.id, title : video.getTitle(APIVideo.TitleType.REGULAR)] ! // get bookmark def bookmarkObservable = getBookmark(video) ! // get artwork def artworkObservable = getArtworkImageUrl(video) ! // merge them and accumulate into the seed. return Observable.merge(bookmarkObservable, artworkObservable) .reduce(seed, { aggregate, current -> aggregate << current}) .map({ [ (video.id.toString()) : it]}) }

157. $ uploadScript.py -e TEST /test/hello sample.groovy { "active": false, "allocationPercentage": 0, "creationDate": "Mon Nov 11 05:38:51 UTC 2013", "revision": 1, "userAuthorizationRequired": true, "userAuthorizationType": "https" }

160. $ uploadScript.py -e TEST /test/hello sample.groovy { "active": false, "allocationPercentage": 0, "creationDate": "Mon Nov 11 05:40:26 UTC 2013", "revision": 2, "userAuthorizationRequired": true, "userAuthorizationType": "https" } $ activateScript.py -e TEST --revision 2 /test/hello { "active": true, "allocationPercentage": 100, "creationDate": "Mon Nov 11 05:40:26 UTC 2013", "previousRevision": null, "revision": 2, "userAuthorizationRequired": true, "userAuthorizationType": "https" }

164. $ uploadScript.py -e TEST /test/hello sample.groovy { "active": false, "allocationPercentage": 0, "creationDate": "Mon Nov 11 05:40:26 UTC 2013", "revision": 2, "userAuthorizationRequired": true, "userAuthorizationType": "https" } $ activateScript.py -e TEST --revision 3 /test/hello { "active": true, "allocationPercentage": 100, "creationDate": "Mon Nov 11 05:42:05 UTC 2013", "previousRevision": 2, "revision": 3, "userAuthorizationRequired": true, "userAuthorizationType": "https" }

165. Future

166. millions of API requests per day 2000 1500 1000 500 0 2008 2009 2010 2011 2012 Today

174. /tv/home /android/home /ps3/home Functional Reactive Dynamic Endpoints Asynchronous Java API Dependency A 10 Threads Dependency F 10 Threads Dependency K 15 Threads Dependency P 10 Threads Dependency B 8 Threads Dependency G 10 Threads Dependency L 4 Threads Dependency Q 8 Threads Dependency C 10 Threads Dependency H 10 Threads Dependency M 5 Threads Dependency R 10 Threads Dependency D 15 Threads Dependency I 5 Threads Dependency N 10 Threads Dependency S 8 Threads Dependency E 5 Threads Dependency J 8 Threads Dependency O 10 Threads Dependency T 10 Threads

175. /tv/home /android/home /ps3/home Hystrix Functional Reactive Dynamic Endpoints fault-isolation layer Asynchronous Java API Dependency A 10 Threads Dependency F 10 Threads Dependency K 15 Threads Dependency P 10 Threads Dependency B 8 Threads Dependency G 10 Threads Dependency L 4 Threads Dependency Q 8 Threads Dependency C 10 Threads Dependency H 10 Threads Dependency M 5 Threads Dependency R 10 Threads Dependency D 15 Threads Dependency I 5 Threads Dependency N 10 Threads Dependency S 8 Threads Dependency E 5 Threads Dependency J 8 Threads Dependency O 10 Threads Dependency T 10 Threads

176. + Observable<User> u = new GetUserCommand(id).observe(); Observable<Geo> g = new GetGeoCommand(request).observe(); ! Observable.zip(u, g, {user, geo -‐> return [username: user.getUsername(), currentLocation: geo.getCounty()] }) RxJava in Hystrix 1.3+ https://github.com/Netﬂix/Hystrix

179. Ben Christensen @benjchristensen http://www.linkedin.com/in/benjchristensen jobs.netflix.com ! Functional Reactive in the Netflix API with RxJava http://techblog.netflix.com/2013/02/rxjava-netflix-api.html ! Optimizing the Netflix API http://techblog.netflix.com/2013/01/optimizing-netflix-api.html ! Application Resilience in a Service-oriented Architecture http://programming.oreilly.com/2013/06/application-resilience-in-a-service-oriented-architecture.html ! Fault Tolerance in a High Volume, Distributed System http://techblog.netflix.com/2012/02/fault-tolerance-in-high-volume.html ! Making the Netflix API More Resilient http://techblog.netflix.com/2011/12/making-netflix-api-more-resilient.html

180. Watch the video with slide synchronization on InfoQ.com! http://www.infoq.com/presentations/netflixapi-evolution

Evolution of the Netflix API

Recommended

Recommended

More Related Content

Similar to Evolution of the Netflix API

Similar to Evolution of the Netflix API (20)

More from C4Media

More from C4Media (20)

Recently uploaded

Recently uploaded (20)

Evolution of the Netflix API