Your SlideShare is downloading. ×
Netflix API: Keynote at Disney Tech Conference
Netflix API: Keynote at Disney Tech Conference
Netflix API: Keynote at Disney Tech Conference
Netflix API: Keynote at Disney Tech Conference
Netflix API: Keynote at Disney Tech Conference
Netflix API: Keynote at Disney Tech Conference
Netflix API: Keynote at Disney Tech Conference
Netflix API: Keynote at Disney Tech Conference
Netflix API: Keynote at Disney Tech Conference
Netflix API: Keynote at Disney Tech Conference
Netflix API: Keynote at Disney Tech Conference
Netflix API: Keynote at Disney Tech Conference
Netflix API: Keynote at Disney Tech Conference
Netflix API: Keynote at Disney Tech Conference
Netflix API: Keynote at Disney Tech Conference
Netflix API: Keynote at Disney Tech Conference
Netflix API: Keynote at Disney Tech Conference
Netflix API: Keynote at Disney Tech Conference
Netflix API: Keynote at Disney Tech Conference
Netflix API: Keynote at Disney Tech Conference
Netflix API: Keynote at Disney Tech Conference
Netflix API: Keynote at Disney Tech Conference
Netflix API: Keynote at Disney Tech Conference
Netflix API: Keynote at Disney Tech Conference
Netflix API: Keynote at Disney Tech Conference
Netflix API: Keynote at Disney Tech Conference
Netflix API: Keynote at Disney Tech Conference
Netflix API: Keynote at Disney Tech Conference
Netflix API: Keynote at Disney Tech Conference
Netflix API: Keynote at Disney Tech Conference
Netflix API: Keynote at Disney Tech Conference
Netflix API: Keynote at Disney Tech Conference
Netflix API: Keynote at Disney Tech Conference
Netflix API: Keynote at Disney Tech Conference
Netflix API: Keynote at Disney Tech Conference
Netflix API: Keynote at Disney Tech Conference
Netflix API: Keynote at Disney Tech Conference
Netflix API: Keynote at Disney Tech Conference
Netflix API: Keynote at Disney Tech Conference
Netflix API: Keynote at Disney Tech Conference
Netflix API: Keynote at Disney Tech Conference
Netflix API: Keynote at Disney Tech Conference
Netflix API: Keynote at Disney Tech Conference
Netflix API: Keynote at Disney Tech Conference
Netflix API: Keynote at Disney Tech Conference
Netflix API: Keynote at Disney Tech Conference
Netflix API: Keynote at Disney Tech Conference
Netflix API: Keynote at Disney Tech Conference
Netflix API: Keynote at Disney Tech Conference
Netflix API: Keynote at Disney Tech Conference
Netflix API: Keynote at Disney Tech Conference
Netflix API: Keynote at Disney Tech Conference
Netflix API: Keynote at Disney Tech Conference
Netflix API: Keynote at Disney Tech Conference
Netflix API: Keynote at Disney Tech Conference
Netflix API: Keynote at Disney Tech Conference
Netflix API: Keynote at Disney Tech Conference
Netflix API: Keynote at Disney Tech Conference
Netflix API: Keynote at Disney Tech Conference
Netflix API: Keynote at Disney Tech Conference
Netflix API: Keynote at Disney Tech Conference
Netflix API: Keynote at Disney Tech Conference
Netflix API: Keynote at Disney Tech Conference
Netflix API: Keynote at Disney Tech Conference
Netflix API: Keynote at Disney Tech Conference
Netflix API: Keynote at Disney Tech Conference
Netflix API: Keynote at Disney Tech Conference
Netflix API: Keynote at Disney Tech Conference
Netflix API: Keynote at Disney Tech Conference
Netflix API: Keynote at Disney Tech Conference
Netflix API: Keynote at Disney Tech Conference
Netflix API: Keynote at Disney Tech Conference
Netflix API: Keynote at Disney Tech Conference
Netflix API: Keynote at Disney Tech Conference
Netflix API: Keynote at Disney Tech Conference
Netflix API: Keynote at Disney Tech Conference
Netflix API: Keynote at Disney Tech Conference
Netflix API: Keynote at Disney Tech Conference
Netflix API: Keynote at Disney Tech Conference
Netflix API: Keynote at Disney Tech Conference
Netflix API: Keynote at Disney Tech Conference
Netflix API: Keynote at Disney Tech Conference
Netflix API: Keynote at Disney Tech Conference
Netflix API: Keynote at Disney Tech Conference
Netflix API: Keynote at Disney Tech Conference
Netflix API: Keynote at Disney Tech Conference
Netflix API: Keynote at Disney Tech Conference
Netflix API: Keynote at Disney Tech Conference
Netflix API: Keynote at Disney Tech Conference
Netflix API: Keynote at Disney Tech Conference
Netflix API: Keynote at Disney Tech Conference
Netflix API: Keynote at Disney Tech Conference
Netflix API: Keynote at Disney Tech Conference
Netflix API: Keynote at Disney Tech Conference
Netflix API: Keynote at Disney Tech Conference
Netflix API: Keynote at Disney Tech Conference
Netflix API: Keynote at Disney Tech Conference
Netflix API: Keynote at Disney Tech Conference
Netflix API: Keynote at Disney Tech Conference
Netflix API: Keynote at Disney Tech Conference
Netflix API: Keynote at Disney Tech Conference
Netflix API: Keynote at Disney Tech Conference
Netflix API: Keynote at Disney Tech Conference
Netflix API: Keynote at Disney Tech Conference
Netflix API: Keynote at Disney Tech Conference
Netflix API: Keynote at Disney Tech Conference
Netflix API: Keynote at Disney Tech Conference
Netflix API: Keynote at Disney Tech Conference
Netflix API: Keynote at Disney Tech Conference
Netflix API: Keynote at Disney Tech Conference
Netflix API: Keynote at Disney Tech Conference
Netflix API: Keynote at Disney Tech Conference
Netflix API: Keynote at Disney Tech Conference
Netflix API: Keynote at Disney Tech Conference
Netflix API: Keynote at Disney Tech Conference
Netflix API: Keynote at Disney Tech Conference
Netflix API: Keynote at Disney Tech Conference
Netflix API: Keynote at Disney Tech Conference
Netflix API: Keynote at Disney Tech Conference
Netflix API: Keynote at Disney Tech Conference
Netflix API: Keynote at Disney Tech Conference
Netflix API: Keynote at Disney Tech Conference
Netflix API: Keynote at Disney Tech Conference
Netflix API: Keynote at Disney Tech Conference
Netflix API: Keynote at Disney Tech Conference
Netflix API: Keynote at Disney Tech Conference
Netflix API: Keynote at Disney Tech Conference
Netflix API: Keynote at Disney Tech Conference
Netflix API: Keynote at Disney Tech Conference
Netflix API: Keynote at Disney Tech Conference
Netflix API: Keynote at Disney Tech Conference
Netflix API: Keynote at Disney Tech Conference
Netflix API: Keynote at Disney Tech Conference
Netflix API: Keynote at Disney Tech Conference
Netflix API: Keynote at Disney Tech Conference
Netflix API: Keynote at Disney Tech Conference
Netflix API: Keynote at Disney Tech Conference
Netflix API: Keynote at Disney Tech Conference
Netflix API: Keynote at Disney Tech Conference
Netflix API: Keynote at Disney Tech Conference
Netflix API: Keynote at Disney Tech Conference
Netflix API: Keynote at Disney Tech Conference
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Netflix API: Keynote at Disney Tech Conference

1,273

Published on

Disney held the first in a series of internal technical conferences in Orlando, FL, this one focused entirely on APIs. These slides are from my keynote presentation which kicked off the event. The …

Disney held the first in a series of internal technical conferences in Orlando, FL, this one focused entirely on APIs. These slides are from my keynote presentation which kicked off the event. The slides focus on the Netflix API, API design, anti-patterns, technical revolutions, resiliency, scaling, test frameworks and other constructs that support the Netflix infrastructure.

Published in: Technology, Business
0 Comments
12 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,273
On Slideshare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
0
Comments
0
Likes
12
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • Thomas Kuhn published The Structure of Scientific Revolutions in 1962. The book was pretty controversial at the time, and in fact, still offers some pretty contentious views.
  • Kuhn describes the predominant view of scientific practice as an effort to continuously “discover” reality. Scientific discoveries are therefore building on top of past discoveries over time, continually getting closer to a comprehensive view of reality
  • Eventually, in principle, science will discover the full truth about reality.
  • Kuhn’s view is that science does not “discovery” reality or build on top of past “discoveries”. Rather, he believes that the majority of scientific work (which he labels “normal science”) is focused on puzzle solving on top of an initial assumption.
  • For example, the common belief centuries ago was that the earth was the center of the universe, with all of the planets and the sun resolving around it (the geocentric view).
  • Given that assumption, normal science builds hypotheses and performs experiments against it. The hypotheses can be proven or disproven within that context and they can build on each other, but they are not progressing towards an unveiling of reality.
  • During the course of normal science, however, anomalies are encountered. These anomalies are often cast aside as errors in observation/tests or for some other reason. But over time, they mount up or become too powerful in numbers or significance that they cannot be ignored.
  • Regarding the geocentric view, the phases of Venus became a very powerful anomaly. This anomaly essentially demonstrates that the shadows and reflections on Venus from the Sun, as well as how it moves throughout the sky, make it impossible for it to revolve around the Earth.
  • When the anomalies encountered are large or frequent enough, they give rise to a competing point of view, a competing assumption. Hypotheses and experiments begin on the new assumption, typically by scientists who otherwise do not practice normal science on the initial assumption.
  • The phases of Venus anomaly ultimately surfaced the competing assumption that the Sun is at the center and the Earth is one of many other objects revolving around it (the heliocentric view).
  • If the competing assumption gains enough traction through revolutionary science and becomes strong enough, there is a scientific revolution where the original assumption is completely overthrown in favor of the new one. Kuhn coined the term “Paradigm Shift” to represent this. It is also important to note that paradigm shifts can take a long time to develop and to conclude. But these shifts are absolute, meaning only one of these paradigms can be the focus of normal science.
  • To be clear, this revolution is not one where the new paradigm is necessarily better than the old one as neither are truly representing reality. It is just a new assumption that appears to be filling the holes of the original or is more representative of modern thinking. In fact, over time, the new paradigm will likely suffer its own anomalies and could very well fall prey to another competing paradigm.
  • The pattern described in his view of scientific revolutions is often seen in technological development. Specifically, there are many such cases in the API industry. The following are a few of these examples.
  • The migration to JSON is mostly due to improved language support and the slenderness of JSON’s payload. As a result, the debate seems to be shifting pretty quickly towards JSON.
  • Here is a breakdown of formats used by a range of APIs in APIHub. The distribution here clearly shows a shift towards JSON over XML/
  • Similarly, from this (out of date, but still relevant, especially since the trend is continuing in the same direction) bar graph, more and more public APIs are launching with no XML support at all, only supporting JSON. The trends are clearly showing that the revolution is underway and that JSON is becoming the new paradigm.
  • Nonetheless, it is important to note that this debate is over-simplified. When you support more and more devices and/or partnerships, it quickly becomes clear that API providers need to be nimble enough to handle a wide range of formats. For example, some devices perform better with hierarchical JSON and others with flat object models, others may require specific XML or JSON schemas, some devices prefer full document delivery and others prefer streaming bits, etc. Finally, this should be a non-issue from an architectural perspectivefor the API providers… Adding new outputs should be easy and if it is not, you have a problem with your system’s architecture.
  • Another common debate is REST vs. SOAP. Clearly, this comparison demonstrates the ease of REST over SOAP.
  • And this chart fromProgrammableWeb shows that the ease of use translates into conversions from SOAP to REST. This revolution is in full swing, in fact, to the point where REST is fully entrenched as the current assumption. It is also important to note that going from SOAP to REST is a true revolution (as defined by Kuhn) as it results in throwing out one for the other.
  • My favorite debate is Open vs. Closed APIs (aka. Public vs. Private).
  • According to ProgrammableWeb (again), the growth of APIs has been tremendous. The APIs tracked, however, are largely open/public APIs, but that is a small subset of the number of APIs that companies use everyday.
  • One way to represent this is the use of an iceberg. At the top, above the water line, is the smallest part of the iceberg. This represents the open APIs as both are highly visible. But the majority of the iceberg’s mass is under the water line, where it is not visible. The larger mass underwater equates to the closed APIs, those that you are most likely not aware of account for the vast majority of the APIs that exist along with the vast majority of the traffic and consumption.
  • In fact, many companies who have opened up their content to the world have seen tremendous traffic from internal services relative to their public feeds or APIs. These five companies all have public APIs, but the overwhelming traffic comes from their branded applications built internally or through direct partnerships. This is another example of a revolution where companies are discounting their focus on public APIs in favor of their private APIs. Revolutions such as these, however, take time…
  • Ultimately, the audience for the API should be the prime driver for its design. The debates presume an objective right or wrong way, which can only be determined against the audience.
  • Another way to think about this is that the public API is a product in and of itself. The private API is a means to an end. With the trends towards private APIs, the focus of those APIs should then move towards supporting the business goals.
  • Again, as the focus becomes more and more towards internal consumption, these are some of the key focal points for API providers moving forward.
  • To demonstrate these focal points, I will use Netflix as an example.
  • But first, the question that needs to be considered is whether or not Netflix is an early anomaly that will lead to a revolution…
  • Or are we just a black sheep?
  • All of this startedwith the launch of streaming in 2007. At the time, we were only streaming on computer-based players (i.e.. No devices, mobile phones, etc.). Also at this time, the content was also not fully liberated.
  • Shortly after streaming launched, in 2008, we launched our REST API. I describe it as a One-Size-Fits-All (OSFA) type of implementation because the API itself sets the rules and requires anyone who interfaces with it to adhere to those rules. Everyone is treated the same.
  • The OSFA API launched to support the 1,000 flowers model. That is, we would plant the seeds in the ground (by providing access to our content) and see what flowers sprout up in the myriad fields throughout the US. The 1,000 flowers are public API developers. At the launch of the public API, the content was fully liberated and the bird was set free to fly around in the open world.
  • And at launch, the API was exclusively targeted towards and consumed by the 1,000 flowers (i.e.. External developers). So all of the API traffic was coming from them.
  • Some examples of the flowers…
  • But as streaming gained more steam…
  • The API evolved to support more of the devices that were getting built. The 1,000 flowers were still supported as well, but as the devices ramped up, they became a bigger focus. And the bird was now mostly flying around the house with occasional visits to the open world.
  • With the adoption of the devices, API traffic took off! We went from about 600 million requests per month to about 42 BILLION requests in just two years. And it has grown substantially more since then.
  • Meanwhile, the balance of requests by audience had completely flipped. Overwhelmingly, the majority of traffic was coming from Netflix-ready devices and a shrinking percentage was from the 1,000 flowers. The rough distribution s now more than 1000-to-1 in favor of internal consumption.
  • We now have more than 36 million global subscribers in more than 50 countries and territories.
  • Those subscribers consume more than a billion hours of streaming video a month which accounts for 33% of the peak Internet traffic in the North America.
  • All 36 million of Netflix’s subscribers are watching shows and movies on virtually any device that has a streaming video screen. We are now on more than 1000 different device types, most of which are supported by the Netflix API, to be discussed throughout this presentation.
  • As our product has evolved over the years, so has the engineering organization. The organization that develops the product is basically shaped like an hourglass.
  • In the top end of the hourglass, we have our device and UI teams who build out great user experiences on Netflix-branded devices. There are currently more than 1000 different device types that we support. To put that into perspective, there are a few hundred more device types that we support than engineers at Netflix.
  • At the bottom end of the hourglass, there are several dozen dependency teams who focus on things like metadata, algorithms, authentication services, A/B test engines, etc.
  • The API is at the center of the hourglass, acting as a broker of data.
  • Given that position in the stack, it is easy to see that the weight of our product, for better or worse, relies on a solid foundation in the API layer.
  • So, the API needs to adopt a new set of criteria for supporting the business. We no longer need to focus on attracting external developers, building communities, etc (all exercises that point more towards the top end of the iceberg). Rather, we need focus our efforts towards being the means to the end, or the bottom end of the iceberg. These are the things needed to be great at being the means.
  • Our new design has been predicated on internal API consumers with a focus towards simplifying the way in which consumers interact with the API, rather than how simple it is for us to administer it.
  • Again, our API traffic grew tremendously over the last few years.
  • Today, we are doing more than 2B incoming requests per day. That kind of growth and those kinds of numbers seem great. Who wouldn’t want those numbers, right?
  • Especially if you are an organization like ESPN serving web pages that have ads on them. If espn.com was serving 2B requests a day, each one of those requests would create impressions for the ad which translates into revenue (and potentially increased CPM at those levels).
  • But the API traffic is not serving pages with ads. Rather, we are delivering documents like this, in the form of XML…
  • Or like this, in the form of JSON.
  • Growth in traffic, especially if it were to continue at this rate, does not directly translate into revenue. Instead, it is more likely to translate into costs. Supporting massive traffic requires major infrastructure to support the load, expenses in delivering the bits, engineering costs to build and support more complex systems, etc.
  • So our first realization was that we could potentially significantly reduce the chattiness between the devices and the API while maintaining the same or better user experience. Rather than handling 2 billion requests per day, could we have the same UI at 300 million instead? Or less? Could having more optimized delivery of the metadata improve the performance and experience for our customers as well?
  • For example, screen size could significantly affect what the API should deliver to the UI. TVs with bigger screens that can potentially fit more titles and more metadata per title than a mobile phone. Do we need to send all of the extra bits for fields or items that are not needed, requiring the device itself to drop items on the floor? Or can we optimize the deliver of those bits on a per-device basis?
  • Different devices have different controlling functions as well. For devices with swipe technologies, such as the iPad, do we need to pre-load a lot of extra titles in case a user swipes the row quickly to see the last of 500 titles in their queue? Or for up-down-left-right controllers, would devices be more optimized by fetching a few items at a time when they are needed? Other devices support voice or hand gestures or pointer technologies. How might those impact the user experience and therefore the metadata needed to support them?
  • The technical specs on these devices differ greatly. Some have significant memory space while others do not, impacting how much data can be handled at a given time. Processing power and hard-drive space could also play a role in how the UI performs, in turn potentially influencing the optimal way for fetching content from the API. All of these differences could result in different potential optimizations across these devices.
  • Many UI teams needing metadata means many requests to the API team. In the OSFA world, we essentially needed to funnel these requests and then prioritize them. That means that some teams would need to wait for API work to be done. It also meant that, because they all shared the same endpoints, we were often adding variations to the endpoints resulting in a more complex system as well as a lot of spaghetti code. Make teams wait due to prioritization was exacerbated by the fact that tasks took longer because the technical debt was increasing, causing time to build and test to increase. Moreover, many of the incoming requests were asking us to do more of the same kinds of customizations. This created a spiral that would be very difficult to break out of…
  • All of these aforementioned issues are essentially anomalies in the current OSFA paradigm. For us, these anomalies carve a path for a revolution (meaning, an opportunity for us to overthrow our current OSFA paradigm with a solution that makes up for the OSFA deficiencies).
  • We evolved our discussion towards what ultimately became a discussion between resource-based APIs and experience-based APIs.
  • The original OSFA API was very resource oriented with granular requests for specific data, delivering specific documents in specific formats.
  • The interaction model looked basically like this, with (in this example) the PS3 making many calls across the network to the OSFA API. The API ultimately called back to dependent services to get the corresponding data needed to satisfy the requests.
  • The interaction model looked basically like this, with (in this example) the PS3 making many calls across the network to the OSFA API. The API ultimately called back to dependent services to get the corresponding data needed to satisfy the requests.
  • The interaction model looked basically like this, with (in this example) the PS3 making many calls across the network to the OSFA API. The API ultimately called back to dependent services to get the corresponding data needed to satisfy the requests.
  • And ultimately, it works. The PS3 interface looks like this and was populated by this interaction model.
  • But we believe this is not the optimal way to handle it. In fact, assembling a UI through many resource-based API calls is akin to pointillism paintings. The picture looks great when fully assembled, but it is done by assembling many points put together in the right way.
  • We have decided to pursue an experience-based approach instead. Rather than making many API requests to assemble the PS3 home screen, the PS3 will potentially make a single request to a custom, optimized endpoint.
  • The interaction model looked basically like this, with (in this example) the PS3 making many calls across the network to the OSFA API. The API ultimately called back to dependent services to get the corresponding data needed to satisfy the requests.
  • The interaction model looked basically like this, with (in this example) the PS3 making many calls across the network to the OSFA API. The API ultimately called back to dependent services to get the corresponding data needed to satisfy the requests.
  • The interaction model looked basically like this, with (in this example) the PS3 making many calls across the network to the OSFA API. The API ultimately called back to dependent services to get the corresponding data needed to satisfy the requests.
  • If resource-based APIs assemble data like pointillism, experience-based APIs assemble data like a photograph. The experience-based approach captures and delivers it all at once.
  • Another key point in our new implementation is that we are not versioning this API in the way that most APIs consider versioning. Rather, we have more of a deprecation model.
  • The problem with versioning, particularly in supporting as many devices at Netflix does, is that many of these devices cannot be updated. And in the case of TVs, for example, they sit on people’s walls for 7-10 years with limited (if even possible) options for updating the app. As a result, any API version that is published that a TV app calls needs to be supported for that long duration.
  • Ultimately, you may end up supporting a large, and growing, number of API versions. The more you support, the tougher it is to maintain and the greater the burden it places on your innovation. Right now, Netflix has a 1.0, 1.5 and 2.0 API. You can quickly imagine in the next 10 years the possibility of a 3.0, 4.0, 5.0, 6.0, etc., making the codebase daunting, ugly and brittle.
  • Rather, we wanted/needed to get away from the slippery slope of versioning so we can have a more sustainable model moving forward. So, when a Java API change is needed, rather than spinning up a new version, we add the new methods then work closely with the UI teams so they can adopt the new APIs. Quick adoption enables us to deprecate the old ones quickly.
  • In terms of revolutions, Netflix may just be a lone anomaly that will be cast away as just that. Given my many conversations with other API providers, however, I suspect that the anomalies encountered with the OSFA APIs are becoming more pervasive. This will likely result in a broader revolution at some point in the future (who knows when…) That said, this design is not for everyone, even if you are experiencing the anomalies that I have discussed. Here is a recipe for those to which something like this could apply…
  • And don’t forget a generous helping of chocolate for your engineers!
  • Because the business relies on a stable API, we need to have robust systems for resiliency, scaling and insights. This, along with the practices from a range of other teams, helps us better protect Netflix customers from failures.
  • At Netflix, we have a range of engineering teams who focus on specific problem sets. Some teams focus on creating rich presentation layers on various devices. Others focus on metadata and algorithms. For the streaming application to work, the metadata from the services needs to make it to the devices. That is where the API comes in. The API essentially acts as a broker, moving the metadata from inside the Netflix system to the devices in customers’ homes.
  • Given the position of the API within the overall system, the API depends on a large number of underlying systems (only some of which are represented here). Moreover, a large number of devices depend on the API (only some of which are represented here). Sometimes, one of these underlying systems experiences an outage.
  • In the past, such an outage could result in an outage in the API.
  • And if that outage cascades to the API, it is likely to have some kind of substantive impact on the devices. The challenge for the API team is to be resilient against dependency outages, to ultimately insulate Netflix customers from low level system problems.
  • To protect our customers from this problem, we created Hystrix (which is now available on our Open Source site). Hystrix is a fault tolerance and resiliency wrapper than isolates dependencies and allows us to treat them differently as problems arise.
  • This is a view of the dashboard that shows some of our dependencies. This dashboard, as well as Turbine, is available in our open source site as well. This dashboard is used as a visualization of the health and traffic of each dependency.
  • This is a view of asingle circuit.
  • This circle represents the call volume and health of the dependency over the last 10 seconds. This circle is meant to be a visual indicator for health. The circle is green for healthy, yellow for borderline, and red for unhealthy. Moreover, the size of the circle represents the call volumes, where bigger circles mean more traffic.
  • The blue line represents the traffic trends over the last two minutes for this dependency.
  • The green number shows the number of successful calls to this dependency over the last two minutes.
  • The yellow number shows the number of latent calls into the dependency. These calls ultimately return successful responses, but slower than expected.
  • The blue number shows the number of calls that were handled by the short-circuited fallback mechanisms. That is, if the circuit gets tripped, the blue number will start to go up.
  • The orange number shows the number of calls that have timed out, resulting in fallback responses.
  • The purple number shows the number of calls that fail due to queuing issues, resulting in fallback responses.
  • The red number shows the number of exceptions, resulting in fallback responses.
  • The error rate is calculated from the total number of error and fallback responses divided by the total number calls handled.
  • If the error rate exceeds a certain number, the circuit to the fallback scenario is automatically opened. When it returns below that threshold, the circuit is closed again.
  • The dashboard also shows host and cluster information for the dependency…
  • As well as information about our SLAs.
  • So, going back to the engineering diagram…
  • If that same service fails today…
  • We simply disconnect from that service.
  • And replace it with an appropriate fallback. In some cases, the fallback is a degraded set of data, in other cases it could be a fast fail 5xx response code. In all cases, our goal is to ensure that an issue in one service does not result in queued up requests that can create further latencies and ultimately bring down the entire system.
  • Ultimately, this allows us to keep our customers happy, even if the experience may be slightly degraded. It is important to note that different dependency libraries have different fallback scenarios. And some are more resilient than others. But the overall sentiment here is accurate at a high level.
  • As a general practice, Netflix focuses on getting code into production as quickly as possible to expose features to new audiences.
  • We do spend a lot of effort testing before deploying our code. That said, we do not attempt the futile feat of trying to make our testing bullet-proof. Instead, we have adopted some new techniques to help us learn more about what the new code will look like in production. After all, there is no substitute for the variability and load that the production servers can offer (especially when handling 50k+ requests per second).
  • First and foremost, we are able to perform these techniques because of the flexibility that the AWS cloud offers us.
  • We have then built our own tools that enable us to see how healthy a range of environments are, among other things…
  • Assuming environments are healthy, we then pursuetwo primary techniques that help us get code into production: canary deployments and red/black deployments.
  • The canary deployments are comparable to canaries in coal mines. We have many servers in production running the current codebase.
  • We will then introduce a single (or perhaps a few) new server(s) into production running new code. Monitoring the canary servers will show what the new code will look like in production.
  • We have canary analysis tools to help us understand how healthy the canary codebase is relative to the current codebase. There are a range of dimensions that go into this calculation, but the final score is provided in the dial above. If green, it is ready to go. If yellow, needs more investigation. If red, definitely not ready.
  • We also use dashboards and charts that show more granular data about the health of the canary.
  • If the canary encounters problems, it will register in any number of ways.
  • If the canary shows errors, we pull it/them down, re-evaluate the new code, debug it, etc.
  • We will then repeat the process until the analysis of canary servers look good.
  • If the new code looks good in the canary, we can then use a technique that we call red/black deployments to launch the code. Start with red, where production code is running. Fire up a new set of servers (black) equal to the count in red with the new code.
  • Then switch the pointer to have external requests point to the black servers.
  • Sometimes errors are encountered at this stage as well…
  • If a problem is encountered from the black servers, it is easy to rollback quickly by switching the pointer back to red. We will then re-evaluate the new code, debug it, etc.
  • Once we have debugged the code, we will put another canary up to evaluate the new changes in production.
  • If the new code looks good in the canary, we can then bring up another set of servers with the new code.
  • Then we will switch production traffic to the new code.
  • Then switch the pointer to have external requests draw from the black servers. If everything still looks good, we disable the red servers and the new code becomes the new red servers.
  • Throughout these steps, we have tools such as this one that show the status of the various deployments around the world.
  • Again, these are the areas of focus for the Netflix API. Based on the pending revolution around private APIs, I suspect that we will see more companies focus on these things as well.
  • But keep in mind, these revolutions happen often in technology. We are constantly in a quest for plugging the leaks in our previous systems by replacing them with a new, improved systems. The hope is that the paradigm shift results in fewer or smaller leaks. But make no mistake, there will be leaks and anomalies in the new system!
  • So don’t get too comfortable with any system that you support. Don’t get married to any technology, guideline, protocol, etc. They are all just means to an end.
  • And make sure you have lots of chocolate!
  • Transcript

    • 1. The Structure of API RevolutionsBy DanielJacobson@daniel_jacobsonImage courtesy of SakeThrajan
    • 2. There are copious commentson each slide for the fullcontext from my presentation
    • 3. Scientific DiscoveryPredominant ViewTime
    • 4. Eventually
    • 5. Scientific PracticeKuhn’s ViewTimeAssumption
    • 6. Image courtesy of Niko Lang
    • 7. Scientific PracticeKuhn’s ViewExperiments on Current AssumptionTimeAssumption
    • 8. Scientific PracticeKuhn’s ViewAnomalies from ExperimentsExperiments on Current AssumptionTimeAssumption
    • 9. Phases of Venus
    • 10. Scientific PracticeKuhn’s ViewNew AssumptionAnomalies from ExperimentsExperiments on Current AssumptionAssumptionAssumptionTime
    • 11. Image courtesy of Niko Lang
    • 12. Scientific PracticeKuhn’s ViewScientific Revolution(aka. ParadigmShift)New AssumptionAnomalies from ExperimentsExperiments on Current AssumptionTimeAssumptionAssumption
    • 13. Scientific PracticeKuhn’s ViewAssumptionAssumptionAssumptionAssumptionAssumptionAssumptionAssumptionAssumptionNew AssumptionAnomalies from ExperimentsExperiments on Current AssumptionTime
    • 14. The Structure of API Revolutions
    • 15. Debate : XML vs. JSON
    • 16. Courtesy of APIHubDebate : XML vs. JSON
    • 17. Debate : XML vs. JSONCourtesy of ProgrammableWeb
    • 18. Debate : XML vs. JSONThis debate is over-simplified!
    • 19. Debate : REST vs. SOAP
    • 20. Debate : REST vs. SOAPCourtesy of ProgrammableWeb
    • 21. Debate : Public vs. Private
    • 22. Courtesy of ProgrammableWeb
    • 23. Partners
    • 24. My View on These Kind ofDebates?Who Cares?!?!
    • 25. What do I care about?
    • 26. My Audience!
    • 27. End in itselfMeans to an end
    • 28. Emerging Focus for the APIIndustry• Internal API Consumers• API Consumer Simplicity Over API Provider Simplicity• Scaling• Resiliency• Tools and Insights• Testing and Automation
    • 29. Brief Look at Netfix API History
    • 30. 2007
    • 31. Netflix REST API:One-Size-Fits-All (OSFA)Solution
    • 32. Image courtesy of Jay Mac 3 on Flickr
    • 33. ExternalDevelopersNetflix API Requests by AudienceAt Launch in 2008
    • 34. Image courtesy of Jay Mac 3 on Flickr
    • 35. Growth of Netflix API Requests0.620.741.7-51015202530354045Jan-10 Jan-11 Jan-12RequestinBillions70x growth in two years!
    • 36. Netflix API Requests by AudienceFrom 2011ExternalDevelopers
    • 37. More than 36 Million SubscribersMore than 50 Countries & Territories
    • 38. Netflix Accounts for 33% of PeakInternet Traffic in North AmericaNetflix subscribers are watching more than 1 billion hours a month
    • 39. 1,000+ DifferentDevice Types
    • 40. 1000+ Device Types
    • 41. Personalization EngineUser InfoMovieMetadataMovieRatingsSimilarMoviesReviewsA/B TestEngineDozens of Dependencies
    • 42. Personalization EngineUser InfoMovieMetadataMovieRatingsSimilarMoviesAPIReviewsA/B TestEngine
    • 43. API0.620.741.7-51015202530354045Jan-10 Jan-11 Jan-12
    • 44. Emerging Focus for the APIIndustry• Internal API Consumers• API Consumer Simplicity Over API Provider Simplicity• Scaling• Resiliency• Tools and Insights• Testing and Automation
    • 45. New Audience = New Charter
    • 46. New Charter = New Design
    • 47. Internal API ConsumersAPI Consumer Simplicity
    • 48. New DesignFocused on three key themes:• Chattiness• Variability across devices• Innovation rates
    • 49. New DesignFocused on three key themes:• Chattiness• Variability across devices• Innovation rates
    • 50. Growth of Netflix API Requests0.620.741.7-51015202530354045Jan-10 Jan-11 Jan-12RequestinBillions70x growth in two years!
    • 51. Growth of the Netflix API> 2 billion requests per dayExploding out to > 14 billion dependency calls per day
    • 52. <catalog_titles><number_of_results>1140</number_of_results><start_index>0</start_index><results_per_page>10</results_per_page><catalog_title><id>http://api.netflix.com/catalog/titles/movies/60021896</id><title short="Star" regular="Star"></title><box_art small="http://alien2.netflix.com/us/boxshots/tiny/60021896.jpg"medium="http://alien2.netflix.com/us/boxshots/small/60021896.jpg"large="http://alien2.netflix.com/us/boxshots/large/60021896.jpg"></box_art><link href="http://api.netflix.com/catalog/titles/movies/60021896/synopsis"rel="http://schemas.netflix.com/catalog/titles/synopsis" title="synopsis"></link><release_year>2001</release_year><category scheme="http://api.netflix.com/catalog/titles/mpaa_ratings" label="NR"></category><category scheme="http://api.netflix.com/categories/genres" label="Foreign"></category><link href="http://api.netflix.com/catalog/titles/movies/60021896/cast"rel="http://schemas.netflix.com/catalog/people.cast" title="cast"></link><link href="http://api.netflix.com/catalog/titles/movies/60021896/screen_formats" rel="http://schemas.netflix.com/catalog/titles/screen_formats" title="screen formats"></link<link href="http://api.netflix.com/catalog/titles/movies/60021896/languages_and_audio" rel="http://schemas.netflix.com/catalog/titles/languages_and_audio" title="languages and audio"></link><average_rating>1.9</average_rating><link href="http://api.netflix.com/catalog/titles/movies/60021896/similars" rel="http://schemas.netflix.com/catalog/titles.similars" title="similars"></link><link href="http://www.netflix.com/Movie/Star/60021896" rel="alternate" title="webpage"></link></catalog_title><catalog_title><id>http://api.netflix.com/catalog/titles/movies/17985448</id><title short="Lone Star" regular="Lone Star"></title><box_art small="http://alien2.netflix.com/us/boxshots/tiny/17985448.jpg" medium="http://alien2.netflix.com/us/boxshots/small/17985448.jpg" large=""></box_art><link href="http://api.netflix.com/catalog/titles/movies/17985448/synopsis" rel="http://schemas.netflix.com/catalog/titles/synopsis" title="synopsis"></link><release_year>1996</release_year><category scheme="http://api.netflix.com/catalog/titles/mpaa_ratings" label="R"></category><category scheme="http://api.netflix.com/categories/genres" label="Drama"></category><link href="http://api.netflix.com/catalog/titles/movies/17985448/awards" rel="http://schemas.netflix.com/catalog/titles/awards" title="awards"></link><link href="http://api.netflix.com/catalog/titles/movies/17985448/format_availability" rel="http://schemas.netflix.com/catalog/titles/format_availability" title="formats"></link><link href="http://api.netflix.com/catalog/titles/movies/17985448/screen_formats" rel="http://schemas.netflix.com/catalog/titles/screen_formats" title="screen formats"></link><link href="http://api.netflix.com/catalog/titles/movies/17985448/languages_and_audio" rel="http://schemas.netflix.com/catalog/titles/languages_and_audio" title="languages and audio"></link><average_rating>3.7</average_rating><link href="http://api.netflix.com/catalog/titles/movies/17985448/previews" rel="http://schemas.netflix.com/catalog/titles/previews" title="previews"></link><link href="http://api.netflix.com/catalog/titles/movies/17985448/similars" rel="http://schemas.netflix.com/catalog/titles.similars" title="similars"></link><link href="http://www.netflix.com/Movie/Lone_Star/17985448" rel="alternate" title="webpage"></link></catalog_title></catalog_titles>
    • 53. {"catalog_title":{"id":"http://api.netflix.com/catalog/titles/movies/60034967","title":{"title_short":"Rosencrantz and Guildenstern Are Dead","regular":"Rosencrantz and Guildenstern Are Dead"},"maturity_level":60,"release_year":"1990","average_rating":3.7,"box_art":{"284pix_w":"http://cdn-7.nflximg.com/en_US/boxshots/ghd/60034967.jpg","110pix_w":"http://cdn-7.nflximg.com/en_US/boxshots/large/60034967.jpg","38pix_w":"http://cdn-7.nflximg.com/en_US/boxshots/tiny/60034967.jpg","64pix_w":"http://cdn-7.nflximg.com/en_US/boxshots/small/60034967.jpg","150pix_w":"http://cdn-7.nflximg.com/en_US/boxshots/150/60034967.jpg","88pix_w":"http://cdn-7.nflximg.com/en_US/boxshots/88/60034967.jpg","124pix_w":"http://cdn-7.nflximg.com/en_US/boxshots/124/60034967.jpg"},"language":"en","web_page":"http://www.netflix.com/Movie/Rosencrantz_and_Guildenstern_Are_Dead/60034967","tiny_url":"http://movi.es/ApUP9"},"meta":{"expand":["@directors","@bonus_materials","@cast","@awards","@short_synopsis","@synopsis","@box_art","@screen_formats","@"links":{"id":"http://api.netflix.com/catalog/titles/movies/60034967","languages_and_audio":"http://api.netflix.com/catalog/titles/movies/60034967/languages_and_audio","title":"http://api.netflix.com/catalog/titles/movies/60034967/title","screen_formats":"http://api.netflix.com/catalog/titles/movies/60034967/screen_formats","cast":"http://api.netflix.com/catalog/titles/movies/60034967/cast","awards":"http://api.netflix.com/catalog/titles/movies/60034967/awards","short_synopsis":"http://api.netflix.com/catalog/titles/movies/60034967/short_synopsis","box_art":"http://api.netflix.com/catalog/titles/movies/60034967/box_art","synopsis":"http://api.netflix.com/catalog/titles/movies/60034967/synopsis","directors":"http://api.netflix.com/catalog/titles/movies/60034967/directors","similars":"http://api.netflix.com/catalog/titles/movies/60034967/similars","format_availability":"http://api.netflix.com/catalog/titles/movies/60034967/format_availability"}}}
    • 54. What if the API request growth ratelooks like this???-20406080100120140160RequestisBillionsIs this good for the long run???
    • 55. Improve Efficiency of APIRequestsCould it have been 300 million requests per day? Or less?(Assuming everything else remained the same)
    • 56. New DesignFocused on three key themes:• Chattiness• Variability across devices• Innovation rates
    • 57. Screen Real Estate
    • 58. Controller
    • 59. Technical Capabilities
    • 60. New DesignFocused on three key themes:• Chattiness• Variability across devices• Innovation rates
    • 61. One-Size-Fits-AllAPIRequestRequestRequest
    • 62. Our Solution…
    • 63. Move away from theOne-Size-Fits-All API model
    • 64. Resource-Based APIvs.Experience-Based API
    • 65. Resource-Based Requests• /users/<id>/ratings/title• /users/<id>/queues• /users/<id>/queues/instant• /users/<id>/recommendations• /catalog/titles/movie• /catalog/titles/series• /catalog/people
    • 66. OSFAAPIRECOMMENDATIONSMOVIEDATASIMILARMOVIESAUTHMEMBERDATAA/BTESTSSTART-UPRATINGSNetwork Border Network Border
    • 67. OSFAAPIRECOMMENDATIONSMOVIEDATASIMILARMOVIESAUTHMEMBERDATAA/BTESTSSTART-UPRATINGSNetwork Border Network BorderSERVER CODECLIENT CODE
    • 68. OSFAAPIRECOMMENDATIONSMOVIEDATASIMILARMOVIESAUTHMEMBERDATAA/BTESTSSTART-UPRATINGSNetwork Border Network BorderDATA GATHERING,FORMATTING,AND DELIVERYUSER INTERFACERENDERING
    • 69. Experience-Based Requests• /ps3/homescreen
    • 70. JAVAAPIRECOMMENDATIONSMOVIEDATASIMILARMOVIESAUTHMEMBERDATAA/BTESTSSTART-UPRATINGSNetwork Border Network BorderGroovy Layer
    • 71. JAVAAPIRECOMMENDATIONSMOVIEDATASIMILARMOVIESAUTHMEMBERDATAA/BTESTSSTART-UPRATINGSGroovy LayerSERVER CODECLIENT CODECLIENT ADAPTER CODE(WRITTEN BY CLIENT TEAMS, DYNAMICALLY UPLOADED TO SERVER)Network Border Network Border
    • 72. JAVAAPIRECOMMENDATIONSMOVIEDATASIMILARMOVIESAUTHMEMBERDATAA/BTESTSSTART-UPRATINGSGroovy LayerDATA GATHERINGDATA FORMATTINGAND DELIVERYUSER INTERFACERENDERINGNetwork Border Network Border
    • 73. Versionless API
    • 74. Average Life of a TV : About 7-10 Years
    • 75. Versioning for APIs1.01.52.0Today3.0?4.0?5.0?2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020
    • 76. Versioning for APIs1.01.52.0Today2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020
    • 77. Benefit to Thinking Versionless• If you can achieve it, maintenance will be MUCH simpler• If you cannot, it instills better practices• Reduces lazy programming• Results in fewer versions• Results in a cleaner, less brittle systemREMEMBER: Adding new features typically does notrequire a new version(structural changes and removals do)
    • 78. Recipe for Targeted APIsAPI providers that have a:1. small number of targeted API consumers2. very close relationships between with API consumers3. increasing divergence of needs across these API consumers4. strong desire for optimization by the API consumers5. optimized APIs offer high value proposition
    • 79. Recipe for Targeted APIsAPI providers that have a:1. small number of targeted API consumers2. very close relationships between with API consumers3. increasing divergence of needs across these API consumers4. strong desire for optimization by the API consumers5. optimized APIs offer high value proposition6. a generous helping of chocolate (to keep engineers happy)
    • 80. Resiliency, Scaling and InsightsProtect the Customer
    • 81. Personalization EngineUser InfoMovieMetadataMovieRatingsSimilarMoviesAPIReviewsA/B TestEngine
    • 82. Personalization EngineUser InfoMovieMetadataMovieRatingsSimilarMoviesAPIReviewsA/B TestEngine
    • 83. Personalization EngineUser InfoMovieMetadataMovieRatingsSimilarMoviesAPIReviewsA/B TestEngine
    • 84. Personalization EngineUser InfoMovieMetadataMovieRatingsSimilarMoviesAPIReviewsA/B TestEngine
    • 85. Circuit Breaker Dashboard & Turbine
    • 86. Call Volume and Health / Last 10 Seconds
    • 87. Call Volume / Last 2 Minutes
    • 88. Successful Requests
    • 89. Successful, But Slower Than Expected
    • 90. Short-Circuited Requests, Delivering Fallbacks
    • 91. Timeouts, Delivering Fallbacks
    • 92. Thread Pool & Task Queue Full, Delivering Fallbacks
    • 93. Exceptions, Delivering Fallbacks
    • 94. Error Rate# + # + # + # / (# + # + # + # + #) = Error Rate
    • 95. Status of Fallback Circuit
    • 96. Requests per Second, Over Last 10Seconds
    • 97. SLA Information
    • 98. Personalization EngineUser InfoMovieMetadataMovieRatingsSimilarMoviesAPIReviewsA/B TestEngine
    • 99. Personalization EngineUser InfoMovieMetadataMovieRatingsSimilarMoviesAPIReviewsA/B TestEngine
    • 100. Personalization EngineUser InfoMovieMetadataMovieRatingsSimilarMoviesAPIReviewsA/B TestEngine
    • 101. Personalization EngineUser InfoMovieMetadataMovieRatingsSimilarMoviesAPIReviewsA/B TestEngineFallback
    • 102. Personalization EngineUser InfoMovieMetadataMovieRatingsSimilarMoviesAPIReviewsA/B TestEngineFallback
    • 103. Development / TestingPhilosophyAct fast, react fast
    • 104. That Doesn’t Mean We Don’tTest• Unit tests• Functional tests• Regression scripts• Continuous integration• Capacity planning• Load / Performance tests
    • 105. AWS Cloud
    • 106. Environment Health Insights
    • 107. Cloud-Based DeploymentTechniques
    • 108. Current CodeIn ProductionAPI Requests fromthe Internet
    • 109. Single Canary InstanceTo Test New Code with Production Traffic(around 1% or less of traffic)Current CodeIn ProductionAPI Requests fromthe Internet
    • 110. Canary Analysis Insights
    • 111. Canary Health Insights
    • 112. Single Canary InstanceTo Test New Code with Production Traffic(around 1% or less of traffic)Current CodeIn ProductionAPI Requests fromthe InternetError!
    • 113. Current CodeIn ProductionAPI Requests fromthe Internet
    • 114. Current CodeIn ProductionAPI Requests fromthe InternetPerfect!
    • 115. Current CodeIn ProductionAPI Requests fromthe InternetNew CodeGetting Prepared forProduction
    • 116. Current CodeIn ProductionAPI Requests fromthe InternetNew CodeGetting Prepared forProduction
    • 117. Error!Current CodeIn ProductionAPI Requests fromthe InternetNew CodeGetting Prepared forProduction
    • 118. Current CodeIn ProductionAPI Requests fromthe InternetNew CodeGetting Prepared forProduction
    • 119. Current CodeIn ProductionAPI Requests fromthe InternetPerfect!
    • 120. Current CodeIn ProductionAPI Requests fromthe InternetNew CodeGetting Prepared forProduction
    • 121. Current CodeIn ProductionAPI Requests fromthe InternetNew CodeGetting Prepared forProduction
    • 122. API Requests fromthe InternetNew CodeGetting Prepared forProduction
    • 123. Deployment Status Insights
    • 124. Netflix API Focus• Internal API Consumers• API Consumer Simplicity Over API Provider Simplicity• Scaling• Resiliency• Tools and Insights• Testing and Automation
    • 125. Image courtesy of johnt HDRcreme
    • 126. Image courtesy of KK+ on Flickr
    • 127. Image courtesy of Mars
    • 128. The Structure of API RevolutionsImage courtesy of SakeThrajan@daniel_jacobsondjacobson@netflix.comhttp://www.linkedin.com/in/danieljacobsonhttp://www.slideshare.net/danieljacobson

    ×