• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Performance and Fault Tolerance for the Netflix API - July 18 2012
 

Performance and Fault Tolerance for the Netflix API - July 18 2012

on

  • 3,520 views

Presented at Silicon Valley Cloud Computing Group on July 18 2012 (http://www.meetup.com/cloudcomputing/events/71823882/) ...

Presented at Silicon Valley Cloud Computing Group on July 18 2012 (http://www.meetup.com/cloudcomputing/events/71823882/)

Audio available at: http://g33ktalk.com/performance-and-fault-tolerance-for-the-netflix-api/

The Netflix API receives over a billion requests a day which translates into multiple billions of calls to underlying systems in the Netflix service-oriented architecture. These requests come from more than 800 different devices ranging from gaming consoles like the PS3, XBox and Wii to set-top boxes, TVs and mobile devices such as Android and iOS.

This presentation describes how the Netflix API supports those devices and achieves fault tolerance in a distributed architecture while depending on dozens of systems which can fail at any time. It also explains how a new system design allows each device to optimize API calls to their unique needs and leverage concurrency on the server-side to improve their performance.

Statistics

Views

Total Views
3,520
Views on SlideShare
2,191
Embed Views
1,329

Actions

Likes
6
Downloads
53
Comments
1

7 Embeds 1,329

http://g33ktalk.com 1171
http://benjaminwootton.co.uk 94
http://www.hakkalabs.co 55
http://feeds.feedburner.com 6
http://www.linkedin.com 1
https://www.google.com 1
https://hakka.herokuapp.com 1
More...

Accessibility

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel

11 of 1 previous next

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
  • Thanks a lot for sharing! Is the code (circuit breaker/dependency command, request-scope batch, request collapsing) planned for being open sourced (http://techblog.netflix.com/2012/07/open-source-at-netflix-by-ruslan.html ) ? Seems to have great challenges in particular w.r.t. automatically testing and detecting race conditions/contentions/corruptions/deadlocks flaws...
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Performance and Fault Tolerance for the Netflix API - July 18 2012 Performance and Fault Tolerance for the Netflix API - July 18 2012 Presentation Transcript

    • Performance and Fault Tolerancefor the Netflix APISilicon Valley Cloud Computing Group - July 18 2012Ben ChristensenSoftware Engineer – API Platform at Netflix@benjchristensenhttp://www.linkedin.com/in/benjchristensenhttp://techblog.netflix.com/ 1
    • Netflix API Dependency A Dependency B Dependency C Dependency D Dependency E Dependency F Dependency G Dependency H Dependency I Dependency J Dependency K Dependency L Dependency M Dependency N Dependency O Dependency P Dependency Q Dependency R 2The Netflix API serves all streaming devices and acts as the broker between backend Netflix systems and the user interfaces running on the 800+ devices that support Netflix streaming.More than 1 billion incoming calls per day are received which in turn fans out to several billion outgoing calls (averaging a ratio of 1:6) to dozens of underlying subsystems with peaks of over200k dependency requests per second.
    • Netflix API Dependency A Dependency B Dependency C Dependency D Dependency E Dependency F Dependency G Dependency H Dependency I Dependency J Dependency K Dependency L Dependency M Dependency N Dependency O Dependency P Dependency Q Dependency R 3First half of the presentation discusses resilience engineering implemented to handle failure and latency at the integration points with the various dependencies.
    • Dozens of dependencies. One going bad takes everything down. 99.99%30 = 99.7% uptime 0.3% of 1 billion = 3,000,000 failures 2+ hours downtime/month even if all dependencies have excellent uptime. Reality is generally worse. 4Even when all dependencies are performing well the aggregate impact of even 0.01% downtime on each of dozens of services equates to potentially hours a month of downtime if notengineered for resilience.
    • 5
    • 6
    • 7Latency is far worse for system resilience than failure. Failures naturally “fail fast” and shed load whereas latency backs up queues, threads and system resources and if isolation techniquesare not used it can cause an entire system to fail.
    • > 80% of requests rejected Median Latency [Sat Jun 30 04:01:37 2012] [error] proxy: HTTP: disabled connection for (127.0.0.1)"Timeout guard" daemon prio=10 tid=0x00002aaacd5e5000 nid=0x3aac runnable [0x00002aaac388f000] java.lang.Thread.State: RUNNABLEat java.net.PlainSocketImpl.socketConnect(Native Method)at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)- locked <0x000000055c7e8bd8> (a java.net.SocksSocketImpl)at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:391)at java.net.Socket.connect(Socket.java:579)at java.net.Socket.connect(Socket.java:528)at java.net.Socket.(Socket.java:425)at java.net.Socket.(Socket.java:280)at org.apache.commons.httpclient.protocol.DefaultProtocolSocketFactory.createSocket(DefaultProtocolSocketFactory.java:80)at org.apache.commons.httpclient.protocol.ControllerThreadSocketFactory$1.doit(ControllerThreadSocketFactory.java:91)at org.apache.commons.httpclient.protocol.ControllerThreadSocketFactory$SocketTask.run(ControllerThreadSocketFactory.java:158)at java.lang.Thread.run(Thread.java:722) 8This is an example of what a system looks like when high latency occurs without load shedding and isolation. Backend latency spiked (from <100ms to >1000ms at the median, >10,000 atthe 90th percentile) and saturated all available resources resulting in the HTTP layer rejecting over 80% of requests.
    • No single dependency should take down the entire app. Fallback. Fail silent. Fail fast. Shed load. 9It is a requirement of high volume, high availability applications to build fault and latency tolerance into their architecture. Infrastructure is an aspect of resilience engineering but it can not berelied upon by itself - software must be resilient.
    • 10Netflix uses a combination of aggressive network timeouts, tryable semaphores and thread pools to isolate dependencies and limit impact of both failure andlatency.
    • Tryable semaphores for “trusted” clients and fallbacks Separate threads for “untrusted” clients Aggressive timeouts on threads and network calls to “give up and move on” Circuit breakers as the “release valve” 11
    • 12With isolation techniques the application container is now segmented according to how it uses its underlying dependencies instead of using a single shared resource pool to communicatewith all of them.
    • 13A single dependency failing will no longer be permitted to take more resources than it was allocated and can have its impactcontrolled.
    • 14In this case the backend service has become latent and saturates all available threads allocated to it so further requests to it are rejected (the orange line) instead of blocking or using up allavailable system threads.
    • 15
    • 30 rps x 0.2 seconds = 6 + breathing room = 10 threadsThread-pool Queue size: 5-10 (0 doesnt work but get close to it) Thread-pool Size + Queue Size Queuing is Not Free 16
    • Cost of Thread @ ~60rps mean - median - 90th - 99th (time in ms) Time for thread to execute Time user thread waited 17The Netflix API has ~30 thread pools with 5-20 threads in each. A common question and concern is what impact this has on performance.Here is a sample of a dependency circuit for 24 hours from the Netflix API production cluster with a rate of 60rps per server.Each execution occurs in a separate thread with mean, median, 90th and 99th percentile latencies shown in the first 4 legend values. The second group of 4 values is the user threadwaiting on the dependency thread and shows the total time including queuing, scheduling, execution and waiting for the return value from the Future.This example was chosen since it is relatively high volume and low latency so the cost of a separate thread is potentially more of a concern than if the backend network latency was 100msor higher.
    • Cost of Thread @ ~60rps mean - median - 90th - 99th (time in ms) Cost: 0ms Time for thread to execute Time user thread waited 18At the median (and lower) there is no cost to having a separatethread.
    • Cost of Thread @ ~60rps mean - median - 90th - 99th (time in ms) Cost: 3ms Time for thread to execute Time user thread waited 19At the 90th percentile there is a cost of 3ms for having a separatethread.
    • Cost of Thread @ ~60rps mean - median - 90th - 99th (time in ms) Cost: 9ms Time for thread to execute Time user thread waited 20At the 99th percentile there is a cost of 9ms for having a separate thread. Note however that the increase in cost is far smaller than the increase in execution time of the separate threadwhich jumped from 2 to 28 whereas the cost jumped from 0 to 9.This overhead at the 90th percentile and higher for circuits such as these has been deemed acceptable for the benefits of resilience achieved.For circuits that wrap very low latency requests (such as those primarily hitting in-memory caches) the overhead can be too high and in those cases we choose to use tryable semaphoreswhich do not allow for timeouts but provide most of the resilience benefits without the overhead. The overhead in general is small enough that we prefer the isolation benefits of a separatethread.
    • Cost of Thread @ ~75rps mean - median - 90th - 99th (time in ms) Time for thread to execute Time user thread waited 21This is a second sample of a dependency circuit for 24 hours from the Netflix API production cluster with a rate of 75rps per server.As with the first example this was chosen since it is relatively high volume and low latency so the cost of a separate thread is potentially more of a concern than if the backend networklatency was 100ms or higher.Each execution occurs in a separate thread with mean, median, 90th and 99th percentile latencies shown in the first 4 legend values. The second group of 4 values is the user threadwaiting on the dependency thread and shows the total time including queuing, scheduling, execution and waiting for the return value from the Future.
    • Cost of Thread @ ~75rps mean - median - 90th - 99th (time in ms) Cost: 0ms Time for thread to execute Time user thread waited 22At the median (and lower) there is no cost to having a separatethread.
    • Cost of Thread @ ~75rps mean - median - 90th - 99th (time in ms) Cost: 2ms Time for thread to execute Time user thread waited 23At the 90th percentile there is a cost of 2ms for having a separatethread.
    • Cost of Thread @ ~75rps mean - median - 90th - 99th (time in ms) Cost: 2ms Time for thread to execute Time user thread waited 24At the 99th percentile there is a cost of 2ms for having a separatethread.
    • Netflix DependencyCommand Implementation 25
    • Netflix DependencyCommand Implementation(1) Construct DependencyCommand ObjectOn each dependency invocation its DependencyCommand object will be constructed with the arguments necessary to make the call to the server.For example: DependencyCommand command = new DependencyCommand(arg1, arg2)(2) Execution Synchronously or AsynchronouslyExecution of the command can then be performed synchronously or asychronously: K value = command.execute() Future<K> value = command.queue()The synchronous call execute() invokes queue().get() unless the command is specified to not run in a thread.(3) Is Circuit Open?Upon execution of the command it first checks with the circuit-breaker to ask "is the circuit open?".If the circuit is open (tripped) then the command will not be executed and flow routed to (8) DependencyCommand.getFallback().If the circuit is closed then the command will be executed and flow continue to (5) DependencyCommand.run().(4) Is Thread Pool/Queue Full?If the thread-pool and queue associated with the command is full then the execution will be rejected and immediately routed through fallback (8).If the command does not run within a thread then this logic will be skipped.(5) DependencyCommand.run()The concrete implementation run() method is executed.(5a) Command TimeoutThe run() method occurs within a thread with a timeout and if it takes too long the thread will throw a TimeoutException. In that case the response is routedthrough fallback (8) and the eventual run() method response is discarded.If the command does not run within a thread then this logic will not be applicable. 26
    • Netflix DependencyCommand Implementation(6) Is Command Successful?Application flow is routed based on the response from the run() method.(6a) Successful ResponseIf no exceptions are thrown and a response is returned (including a null value) then it proceeds to return the response after some logging and a performancecheck.(6b) Failed ResponseWhen a response throws an exception it will mark it as "failed" which will contribute to potentially tripping the circuit open and it will route application flow to (8)DependencyCommand.getFallback().(7) Calculate Circuit HealthSuccesses, failures, rejections and timeouts are all reported to the circuit breaker to maintain a rolling set of counters which calculate statistics.These stats are then used to determine when the circuit should "trip" and become open at which point subsequent requests are short-circuited until a period oftime passes and requests are permitted again after health checks succeed.(8) DependencyCommand.getFallback()The fallback is performed whenever a command execution fails (an exception is thrown by (5) DependencyCommand.run()) or when it is (3) short-circuitedbecause the circuit is open.The intent of the fallback is to provide a generic response without any network dependency from an in-memory cache or other static logic.(8a) Fallback Not ImplementedIf DependencyCommand.getFallback() is not implemented then an exception with be thrown and the caller left to deal with it.(8b) Fallback SuccessfulIf the fallback returns a response then it will be returned to the caller.(8c) Fallback FailedIf DependencyCommand.getFallback() fails and throws an exception then the caller is left to deal with it.This is considered a poor practice to have a fallback implementation that can fail. A fallback should be implemented such that it is not performing any logic thatwould fail. Semaphores are wrapped around fallback execution to protect against software bugs that do not comply with this principle, particular if the fallback itselftries to perform a network call that can be latent.(9) Return Successful ResponseIf (6a) occurred the successful response will be returned to the caller regardless of whether it was latent or not. 27
    • Netflix DependencyCommand Implementation Fallbacks Cache Eventual Consistency Stubbed Data Empty Response 28
    • Netflix DependencyCommand Implementation 29
    • So, how does it work in the real world? 30
    • Visualizing Circuits in Near-Realtime (latency is single-digit seconds, generally 1-2) 31This is an example of our monitoring system which provides low-latency (1-2 seconds typically) visibility into the traffic and health of all DependencyCommand circuits across acluster.
    • circle color and size represent Error percentage of health and traffic volume last 10 second Request rate 2 minutes of request rate to show relative changes in traffic Circuit-breaker status hosts reporting from cluster last minute latency percentiles Rolling 10 second counters with 1 second granularity Successes Thread timeouts Short-circuited (rejected) Thread-pool Rejections Failures/Exceptions 32
    • API Daily Incoming vs OutgoingWeekend Weekend Weekend 8-10 Billion DependencyCommand Executions (threaded) 1.2 - 1.6 Billion Incoming Requests 33
    • API Hourly Incoming vs Outgoing Peak at 200k+ threaded DependencyCommand executions/second Peak at 30k+ incoming requests/second 34
    • 35This view of the dashboard was captured during a latency monkey simulation to test resilience against latency (http://techblog.netflix.com/2011/07/netflix-simian-army.html) and showshow several of the DependencyCommands degraded in health and showed timeouts, threadpool rejections, short-circuiting and failures.The DependencyCommands of dependencies not affected by latency were unaffected.During this test no users were prevented from using Netflix on any devices. Instead fallbacks and graceful degradation occurred and as soon as latency was removed all systems returnedto health within seconds.
    • 36This was another latency monkey simulation that affected a singleDependencyCommand.
    • Latency spikes from ~30ms median to first 2000+ then 10000+ ms Success drops off, Timeouts and Short Circuiting shed load Peak at 100M+ incoming requests (30k+/second) 37These graphs show the full duration of a latency monkey simulation (and look similar to real production events) when latency occurred and the DependencyCommand timed-out and short-circuited the requests and returned fallbacks.
    • 38
    • Fallback.Fail silent. Fail fast.Shed load. 39
    • Netflix API Dependency A Dependency B Dependency C Dependency D Dependency E Dependency F Dependency G Dependency H Dependency I Dependency J Dependency K Dependency L Dependency M Dependency N Dependency O Dependency P Dependency Q Dependency R 40Second half of the presentation discusses architectural changes to enable optimizing the API for each Netflix device as opposed to a generic one-size-fits-all API which treats all devicesthe same.
    • Single Network Request from Clients (use LAN instead of WAN) Device Server Netflix API landing page requires ~dozen API requests 41The one-size-fits-all API results in chatty clients, some requiring ~dozen requests to render a page.
    • Single Network Request from Clients (use LAN instead of WAN)some clients are limited in the number of concurrent network connections 42
    • Single Network Request from Clients (use LAN instead of WAN)network latency makes this even worse(mobile, home, wifi, geographic distance, etc) 43
    • Single Network Request from Clients (use LAN instead of WAN) Device Server Netflix API push call pattern to server ... 44The client should make a single request and push the chatty part to the server where low-latency networks and multi-core servers can perform the work far moreefficiently.
    • Single Network Request from Clients (use LAN instead of WAN) Device Server Netflix API ... and eliminate redundant calls 45
    • 46
    • Send Only The Bytes That Matter (optimize responses for each client) Netflix API Device Server Client Client part of client now on server 47The client now extends over the network barrier and runs a portion in the server itself. The client sends requests over HTTP to its other half running in the server which then can access aJava API at a very granular level to access exactly what it needs and return an optimized response suited to the devices exact requirements and user experience.
    • Send Only The Bytes That Matter (optimize responses for each client) Netflix API Device Server Client Clientclient retrieves and delivers exactly what their device needs in its optimal format 48
    • Send Only The Bytes That Matter (optimize responses for each client) Device Server Netflix API Service LayerClient Client interface is now a Java API that client interacts with at a granular level 49
    • Leverage Concurrency (but abstract away its complexity) Device Server Netflix API Service LayerClient Client 50
    • Leverage Concurrency (but abstract away its complexity) Device Server Netflix API Service Layer Client Client no synchronized, volatile, locks, Futures or Atomic*/Concurrent* classes in client-server code 51Concurrency is abstracted away behind an asynchronous API and data is retrieved, transformed and composed using high-order-functions (such as map, mapMany, merge, zip, take,toList, etc). Groovy is used for its closure support that lends itself well to the functional programming style.
    • Functional Reactive Programming composable asynchronous functionsService calls are def video1Call = api.getVideos(api.getUser(), 123456, 7891234);all asynchronous def video2Call = api.getVideos(api.getUser(), 6789543); // higher-order functions used to compose asynchronous calls together wx.merge(video1Call, video2Call).toList().subscribe([ Functional onNext: { programming listOfVideos -> with higher-order for(video in listOfVideos) { functions response.getWriter().println("video: " + video.id + " " + video.title); } }, onError: { exception -> response.setStatus(500); response.getWriter().println("Error: " + exception.getMessage()); } ]) Fully asynchronous API - Clients can’t block 52
    • Request Collapsing batch don’t burst 53The DependencyCommand resilience layer is leveraged for concurrency including optimizations such as request collapsing (automated batching) which bundles bursts of calls to the sameservice into batches without the client code needing to understand or manually optimize for batching. This is particularly important when client code becomes highly concurrent and data isrequested in multiple different code paths sometimes written by different engineers. Request collapsing automatically captures and batches the calls together. The collapsing functionalityalso supports sharded architectures so a batch of requests can be sharded into sub-batches if the client-server relationship requires requests to be routed to a sharded backend.
    • Request Collapsing batch don’t burst 100:1 collapsing ratio (batch size of ~100) 54This graph shows an extreme example of a dependency where we collapse requests at a ratio of100:1
    • Request Collapsing batch don’t burst 4000 rps instead of 400,000 rps 100:1 collapsing ratio (batch size of ~100) 55This is the same graph but on a power scale instead of linear so the blue line (actual network requests) showsup.
    • Request Scoped Caching short-lived and concurrency aware 56Another use of the DependencyCommand layer is to allow client code to perform requests without concern of duplicate network calls due to concurrency.The Futures is atomically cached using “putIfAbsent” in the request scope shared via ThreadLocals of each thread so clients can request data in multiple code paths without inefficiency concerns.
    • Request Caching stateless 57Some examples of request caching de-duplicating backend calls. On some the impact is reasonably high while on most it is a small percentage or none at all but overall provided ameasurable drop in network calls and in some use cases for client code significantly improved latency by eliminating unnecessary network calls.
    • Device Server Netflix API Optimize for each device. Leverage the server. 58The Netflix API is becoming a platform that empowers user-interface teams to build their own API endpoints that are optimized to their client applications anddevices.
    • Netflix is Hiring http://jobs.netflix.com Fault Tolerance in a High Volume, Distributed System http://techblog.netflix.com/2012/02/fault-tolerance-in-high-volume.html Making the Netflix API More Resilient http://techblog.netflix.com/2011/12/making-netflix-api-more-resilient.htmlEmbracing the Differences : Inside the Netflix API Redesign http://techblog.netflix.com/2012/07/embracing-differences-inside-netflix.html Ben Christensen @benjchristensen http://www.linkedin.com/in/benjchristensen 59