Talk about the Netflix API and how it serves as the front door for Netflix device UIs. Topics include: API design, resiliency patterns, scalability, and enabling fast dev/deploy cycles.
The term "scale" for engineering often is used to discuss systems and their ability to grow with the needs of its users. This is clearly an important aspect of scaling, but there are many other areas in which an engineering organization needs to scale to be successful in the long term. This presentation discusses some of those other areas and details how Netflix (and specifically the API team) addresses them.
Netflix API: Keynote at Disney Tech ConferenceDaniel Jacobson
Disney held the first in a series of internal technical conferences in Orlando, FL, this one focused entirely on APIs. These slides are from my keynote presentation which kicked off the event. The slides focus on the Netflix API, API design, anti-patterns, technical revolutions, resiliency, scaling, test frameworks and other constructs that support the Netflix infrastructure.
Most API providers focus on solving all three of the key challenges for APIs: data gathering, data formatting and data delivery. All three of these functions are critical for the success of an API, however, not all should be solved by the API provider. Rather, the API consumers have a strong, vested interest in the formatting and delivery. As a result, API design should be addressed based on the true separation of concerns between the needs of the API provider and the various API consumers.
This presentation goes into the separation of concerns. It also goes into depth in how Netflix has solved for this problem through a very different approach to API design.
This presentation was given at the following API Meetup in SF:
http://www.meetup.com/API-Meetup/events/171255242/
This is a presentation that I gave to ESPN's Digital Media team about the trajectory of the Netflix API. I also discussed Netflix's device implementation strategy and how it enables rapid development and robust A/B testing.
The term "scale" for engineering often is used to discuss systems and their ability to grow with the needs of its users. This is clearly an important aspect of scaling, but there are many other areas in which an engineering organization needs to scale to be successful in the long term. This presentation discusses some of those other areas and details how Netflix (and specifically the API team) addresses them.
Techniques for Scaling the Netflix API - QCon SFDaniel Jacobson
This presentation was from QCon SF 2011. In these slides I discuss various techniques that we use to scale the API. I also discuss in more detail our effort around redesigning the API.
Scaling the Netflix API - From Atlassian Dev DenDaniel Jacobson
The term "scale" for engineering often is used to discuss systems and their ability to grow with the needs of its users. This is clearly an important aspect of scaling, but there are many other areas in which an engineering organization needs to scale to be successful in the long term. This presentation discusses some of those other areas and details how Netflix (and specifically the API team) addresses them.
The term "scale" for engineering often is used to discuss systems and their ability to grow with the needs of its users. This is clearly an important aspect of scaling, but there are many other areas in which an engineering organization needs to scale to be successful in the long term. This presentation discusses some of those other areas and details how Netflix (and specifically the API team) addresses them.
Netflix API: Keynote at Disney Tech ConferenceDaniel Jacobson
Disney held the first in a series of internal technical conferences in Orlando, FL, this one focused entirely on APIs. These slides are from my keynote presentation which kicked off the event. The slides focus on the Netflix API, API design, anti-patterns, technical revolutions, resiliency, scaling, test frameworks and other constructs that support the Netflix infrastructure.
Most API providers focus on solving all three of the key challenges for APIs: data gathering, data formatting and data delivery. All three of these functions are critical for the success of an API, however, not all should be solved by the API provider. Rather, the API consumers have a strong, vested interest in the formatting and delivery. As a result, API design should be addressed based on the true separation of concerns between the needs of the API provider and the various API consumers.
This presentation goes into the separation of concerns. It also goes into depth in how Netflix has solved for this problem through a very different approach to API design.
This presentation was given at the following API Meetup in SF:
http://www.meetup.com/API-Meetup/events/171255242/
This is a presentation that I gave to ESPN's Digital Media team about the trajectory of the Netflix API. I also discussed Netflix's device implementation strategy and how it enables rapid development and robust A/B testing.
The term "scale" for engineering often is used to discuss systems and their ability to grow with the needs of its users. This is clearly an important aspect of scaling, but there are many other areas in which an engineering organization needs to scale to be successful in the long term. This presentation discusses some of those other areas and details how Netflix (and specifically the API team) addresses them.
Techniques for Scaling the Netflix API - QCon SFDaniel Jacobson
This presentation was from QCon SF 2011. In these slides I discuss various techniques that we use to scale the API. I also discuss in more detail our effort around redesigning the API.
Scaling the Netflix API - From Atlassian Dev DenDaniel Jacobson
The term "scale" for engineering often is used to discuss systems and their ability to grow with the needs of its users. This is clearly an important aspect of scaling, but there are many other areas in which an engineering organization needs to scale to be successful in the long term. This presentation discusses some of those other areas and details how Netflix (and specifically the API team) addresses them.
Maintaining the Netflix Front Door - Presentation at Intuit MeetupDaniel Jacobson
This presentation goes into detail on the key principles behind the Netflix API, including design, resiliency, scaling, and deployment. Among other things, I discuss our migration from our REST API to what we call our Experienced-Based API design. It also shares several of our open source efforts such as Zuul, Scryer, Hystrix, RxJava and the Simian Army.
This is my presentation from the Business of APIs Conference in SF, held by Mashery (http://www.apiconference.com).
This talk talks briefly about the history of the Netflix API, then goes into three main categories of scaling:
1. Using the cloud to scale in size and internationally
2. Using Webkit to scale application development in parallel to the flexibility afforded by the API
3. Redesigning the API to improve performance and to downscale the infrastructure as the system scales
When viewing these slides, please note that they are almost entirely image-based, so I have added notes for each slide to detail the talking points.
APIs for Internal Audiences - Netflix - App Dev ConferenceDaniel Jacobson
API programs, typically thought of as a public program to see what public developer communities can build with a company's data, are becoming more and more critical to the success of mobile and device strategies. This presentation takes a look at Netflix's and NPR's strategies that lead to tremendous growth and discusses how Netflix plans to take this internal API strategy to the next level.
This presentation demonstrates the great successes of the Netflix API to date. After some introspection, however, there is an opportunity to better prepare the API for the future. This presentation also offers a few ideas on how the Netflix API architecture may change over time.
Revolutions have a common pattern in technology and this is no different for the API space. This presentation discusses that pattern and goes through various API revolutions. It also uses Netflix as an example of how some revolutions evolved and where things may be headed.
Set Your Content Free! : Case Studies from Netflix and NPRDaniel Jacobson
Last Friday (February 8th), I spoke at the Intelligent Content Conference 2013. When Scott Abel (aka The Content Wrangler) first contacted me to speak at the event, he asked me to speak about my content management and distribution experiences from both NPR and Netflix. The two experiences seemed to him to be an interesting blend for the conference. These are the slides from that presentation.
I have applied comments to every slide in this presentation to include the context that I otherwise provided verbally during the talk.
Many API programs get launched without a clear understanding as to WHY the API should exist. Rather, many are focused on WHAT the API consists of and HOW it should be targeted, implemented and leveraged. This presentation focuses on establishing the need for a clear WHY proposition behind the decision. The HOW and then WHAT will follow from that.
This presentation also uses the history of the Netflix API to demonstrate the power, utility and importance of knowing WHY you are building an API.
A high-level talk on developing APIs for mobile devices in enterprise, partner, and internal modes, with notes on our experience at Klout. Some big concepts are glossed over, and lots of details may be simplified for brevity.
As enterprises embrace APIs, some very specific Enterprise API Adoption patterns and best practices have started emerging. In this session, Laura Heritage, Principal Solutions Architect at SOA Software, will talk about the most common enterprise API patterns and will discuss how enterprises can successfully launch an API program.
The Business Value for Internal APIs in the EnterpriseAkana
- The value of internal API programs
- How APIs and SOA fit together
- Deployment patterns for Internal APIs
- Architecture concerns about API Gateways and ESBs
The enterprise has learned from the consumer API movement and recognized the value of creating developer communities to drive the adoption and productive use of APIs. Building an API community internally, however, requires a different approach from what has worked in the consumer space. Business objectives for APIs and measurements of success tend to be different for internal APIs. Security and access controls are not the same, of course, and back-end systems tend to be quite a lot more complex in the enterprise than they are in public-facing API situations. This webinar explores the challenges and best practices inherent in building an internal API community that serves an enterprise’s business and technological goals.
What's hot in APIs? Here are 10 of the hottest trends in open APIs today. This GlueCon 2012 keynote covers monetization trends, technology trends and what makes developers love an API (hint: it's not stale documentation). These are drawn from our data and trends we're seeing at ProgrammableWeb.
Maintaining the Netflix Front Door - Presentation at Intuit MeetupDaniel Jacobson
This presentation goes into detail on the key principles behind the Netflix API, including design, resiliency, scaling, and deployment. Among other things, I discuss our migration from our REST API to what we call our Experienced-Based API design. It also shares several of our open source efforts such as Zuul, Scryer, Hystrix, RxJava and the Simian Army.
This is my presentation from the Business of APIs Conference in SF, held by Mashery (http://www.apiconference.com).
This talk talks briefly about the history of the Netflix API, then goes into three main categories of scaling:
1. Using the cloud to scale in size and internationally
2. Using Webkit to scale application development in parallel to the flexibility afforded by the API
3. Redesigning the API to improve performance and to downscale the infrastructure as the system scales
When viewing these slides, please note that they are almost entirely image-based, so I have added notes for each slide to detail the talking points.
APIs for Internal Audiences - Netflix - App Dev ConferenceDaniel Jacobson
API programs, typically thought of as a public program to see what public developer communities can build with a company's data, are becoming more and more critical to the success of mobile and device strategies. This presentation takes a look at Netflix's and NPR's strategies that lead to tremendous growth and discusses how Netflix plans to take this internal API strategy to the next level.
This presentation demonstrates the great successes of the Netflix API to date. After some introspection, however, there is an opportunity to better prepare the API for the future. This presentation also offers a few ideas on how the Netflix API architecture may change over time.
Revolutions have a common pattern in technology and this is no different for the API space. This presentation discusses that pattern and goes through various API revolutions. It also uses Netflix as an example of how some revolutions evolved and where things may be headed.
Set Your Content Free! : Case Studies from Netflix and NPRDaniel Jacobson
Last Friday (February 8th), I spoke at the Intelligent Content Conference 2013. When Scott Abel (aka The Content Wrangler) first contacted me to speak at the event, he asked me to speak about my content management and distribution experiences from both NPR and Netflix. The two experiences seemed to him to be an interesting blend for the conference. These are the slides from that presentation.
I have applied comments to every slide in this presentation to include the context that I otherwise provided verbally during the talk.
Many API programs get launched without a clear understanding as to WHY the API should exist. Rather, many are focused on WHAT the API consists of and HOW it should be targeted, implemented and leveraged. This presentation focuses on establishing the need for a clear WHY proposition behind the decision. The HOW and then WHAT will follow from that.
This presentation also uses the history of the Netflix API to demonstrate the power, utility and importance of knowing WHY you are building an API.
A high-level talk on developing APIs for mobile devices in enterprise, partner, and internal modes, with notes on our experience at Klout. Some big concepts are glossed over, and lots of details may be simplified for brevity.
As enterprises embrace APIs, some very specific Enterprise API Adoption patterns and best practices have started emerging. In this session, Laura Heritage, Principal Solutions Architect at SOA Software, will talk about the most common enterprise API patterns and will discuss how enterprises can successfully launch an API program.
The Business Value for Internal APIs in the EnterpriseAkana
- The value of internal API programs
- How APIs and SOA fit together
- Deployment patterns for Internal APIs
- Architecture concerns about API Gateways and ESBs
The enterprise has learned from the consumer API movement and recognized the value of creating developer communities to drive the adoption and productive use of APIs. Building an API community internally, however, requires a different approach from what has worked in the consumer space. Business objectives for APIs and measurements of success tend to be different for internal APIs. Security and access controls are not the same, of course, and back-end systems tend to be quite a lot more complex in the enterprise than they are in public-facing API situations. This webinar explores the challenges and best practices inherent in building an internal API community that serves an enterprise’s business and technological goals.
What's hot in APIs? Here are 10 of the hottest trends in open APIs today. This GlueCon 2012 keynote covers monetization trends, technology trends and what makes developers love an API (hint: it's not stale documentation). These are drawn from our data and trends we're seeing at ProgrammableWeb.
Maintaining the Front Door to Netflix : The Netflix APIDaniel Jacobson
This presentation was given to the engineering organization at Zendesk. In this presentation, I talk about the challenges that the Netflix API faces in supporting the 1000+ different device types, millions of users, and billions of transactions. The topics range from resiliency, scale, API design, failure injection, continuous delivery, and more.
This deck is an a joining of ideas from numerous visits to clients around the wound. Here we show the three most common design patterns and explain the pros and cons
Main focus of the talk is to communicate some key concepts of designing/implementing APIs based on an enterprise grade API Standards and Guidelines. We will try to handcraft few API recipes(i.e. implementation design) with real-life examples mixed with a live coding session. While working on each recipe, we will delve into the rationale behind design decisions and best practices. We believe that these concepts will help a developer build a comprehensive API solution from scratch.
I gave this presentation to the engineering team at PayPal. This presentation discusses the history and future of the Netflix API. It also goes into API design principles as well as concepts behind system scalability and resiliency.
Manage your Public API Like a ProtocolDelyn Simons
As the number of public APIs available to developers skyrockets, developers are increasingly asked to evaluate dozens of API providers based on their market opportunity, ease of integration and stability of service - then choose a winner to integrate with. You can quickly communicate that your company is interested in providing opportunity to developers by managing your API like a protocol. Discover why good versioning practices, incorporating developer input, participating in developer meetups and hackathons, and adopting existing standards whenever possible makes good business sense for both API developers and API providers.
Developing an API strategy should be considered a journey, not a project with a predetermined outcome. This presentation describes Netflix's journey to discover a winning API strategy as well as future directions for the API.
Extend Your Use of JIRA by Solving Your Unique Concerns: An Exposé of the New...Atlassian
The existence of an API allows developers to extend software so as to cater for unique use cases beyond the software's original scope. Administrators and end users of JIRA 5 can expect its REST API to enable the creation of integrated applications to solve their unique concerns. This presentation aims to describe ways in which the JIRA 5 REST API can be used to make a tangible impact for the end user. Several use cases will be discussed, ranging from running simple command line apps, through to creating web applications that integrate with the JIRA 5 REST API.
Extend Your Use of JIRA by Solving Your Unique Concerns: An Exposé of the New...Atlassian
The existence of an API allows developers to extend software so as to cater for unique use cases beyond the software's original scope. Administrators and end users of JIRA 5 can expect its REST API to enable the creation of integrated applications to solve their unique concerns. This presentation aims to describe ways in which the JIRA 5 REST API can be used to make a tangible impact for the end user. Several use cases will be discussed, ranging from running simple command line apps, through to creating web applications that integrate with the JIRA 5 REST API.
Are your APIs becoming too complicated and ad hoc? Feeling the need to set up policies for your API? This presentation will give you strategy options for designing and developing your APIs.
Build an AWS Analytics Solution to Monitor the Video Streaming Experience (MA...Amazon Web Services
In this workshop, we build and deploy an end-to-end analytics solution for monitoring the video streaming experience. We integrate an open source video player with Amazon Kinesis Data Streams to capture events in real time. We explore the data available for capture and a variety of use cases: from generating alerts on poor experience to content recommendations based on user behavior. We also show you how this real-time data can be archived in a data lake and further used to generate reports of aggregate performance and experience across a number of dimensions.
Architectural considerations when building an APIRod Hemphill
An properly designed API for either mobile apps or 3rd party access needs to be built with maintainability, security, version control, data volume optimisation and speed performance in mind. Rod Hemphill from Melbourne App Development explains the options and his experience.
Petabytes of Data & No Servers: Corteva Scales DNA Analysis to Meet Increasin...Amazon Web Services
Corteva Agriscience, the agricultural division of DowDuPont, produces as much DNA sequence data every six hours as existed in the entire public sphere in 2008. On-premises processing and storage could not scale to meet the business demand. Partnering with Sogeti (part of Capgemini), Corteva replatformed their existing Hadoop-based genome processing systems into AWS using a serverless, cloud-native architecture. In this session, learn how Corteva Agriscience met current and future data processing demands without maintaining any long-running servers by using AWS Lambda, Amazon S3, Amazon API Gateway, Amazon EMR, AWS Glue, AWS Batch, and more. This session is brought to you by AWS partner, Capgemini America.
# Internet Security: Safeguarding Your Digital World
In the contemporary digital age, the internet is a cornerstone of our daily lives. It connects us to vast amounts of information, provides platforms for communication, enables commerce, and offers endless entertainment. However, with these conveniences come significant security challenges. Internet security is essential to protect our digital identities, sensitive data, and overall online experience. This comprehensive guide explores the multifaceted world of internet security, providing insights into its importance, common threats, and effective strategies to safeguard your digital world.
## Understanding Internet Security
Internet security encompasses the measures and protocols used to protect information, devices, and networks from unauthorized access, attacks, and damage. It involves a wide range of practices designed to safeguard data confidentiality, integrity, and availability. Effective internet security is crucial for individuals, businesses, and governments alike, as cyber threats continue to evolve in complexity and scale.
### Key Components of Internet Security
1. **Confidentiality**: Ensuring that information is accessible only to those authorized to access it.
2. **Integrity**: Protecting information from being altered or tampered with by unauthorized parties.
3. **Availability**: Ensuring that authorized users have reliable access to information and resources when needed.
## Common Internet Security Threats
Cyber threats are numerous and constantly evolving. Understanding these threats is the first step in protecting against them. Some of the most common internet security threats include:
### Malware
Malware, or malicious software, is designed to harm, exploit, or otherwise compromise a device, network, or service. Common types of malware include:
- **Viruses**: Programs that attach themselves to legitimate software and replicate, spreading to other programs and files.
- **Worms**: Standalone malware that replicates itself to spread to other computers.
- **Trojan Horses**: Malicious software disguised as legitimate software.
- **Ransomware**: Malware that encrypts a user's files and demands a ransom for the decryption key.
- **Spyware**: Software that secretly monitors and collects user information.
### Phishing
Phishing is a social engineering attack that aims to steal sensitive information such as usernames, passwords, and credit card details. Attackers often masquerade as trusted entities in email or other communication channels, tricking victims into providing their information.
### Man-in-the-Middle (MitM) Attacks
MitM attacks occur when an attacker intercepts and potentially alters communication between two parties without their knowledge. This can lead to the unauthorized acquisition of sensitive information.
### Denial-of-Service (DoS) and Distributed Denial-of-Service (DDoS) Attacks
Multi-cluster Kubernetes Networking- Patterns, Projects and GuidelinesSanjeev Rampal
Talk presented at Kubernetes Community Day, New York, May 2024.
Technical summary of Multi-Cluster Kubernetes Networking architectures with focus on 4 key topics.
1) Key patterns for Multi-cluster architectures
2) Architectural comparison of several OSS/ CNCF projects to address these patterns
3) Evolution trends for the APIs of these projects
4) Some design recommendations & guidelines for adopting/ deploying these solutions.
1.Wireless Communication System_Wireless communication is a broad term that i...JeyaPerumal1
Wireless communication involves the transmission of information over a distance without the help of wires, cables or any other forms of electrical conductors.
Wireless communication is a broad term that incorporates all procedures and forms of connecting and communicating between two or more devices using a wireless signal through wireless communication technologies and devices.
Features of Wireless Communication
The evolution of wireless technology has brought many advancements with its effective features.
The transmitted distance can be anywhere between a few meters (for example, a television's remote control) and thousands of kilometers (for example, radio communication).
Wireless communication can be used for cellular telephony, wireless access to the internet, wireless home networking, and so on.
Bridging the Digital Gap Brad Spiegel Macon, GA Initiative.pptxBrad Spiegel Macon GA
Brad Spiegel Macon GA’s journey exemplifies the profound impact that one individual can have on their community. Through his unwavering dedication to digital inclusion, he’s not only bridging the gap in Macon but also setting an example for others to follow.
APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024APNIC
Ellisha Heppner, Grant Management Lead, presented an update on APNIC Foundation to the PNG DNS Forum held from 6 to 10 May, 2024 in Port Moresby, Papua New Guinea.
This 7-second Brain Wave Ritual Attracts Money To You.!nirahealhty
Discover the power of a simple 7-second brain wave ritual that can attract wealth and abundance into your life. By tapping into specific brain frequencies, this technique helps you manifest financial success effortlessly. Ready to transform your financial future? Try this powerful ritual and start attracting money today!
3. More than 48 Million Subscribers
More than 40 Countries
4. Netflix Accounts for >34% of Peak
Downstream Traffic in North America
Netflix subscribers are watching more than 1 billion hours a month
5. Netflix Accounts for >6% of Peak
Upstream Traffic in North America
Netflix subscribers are watching more than 1 billion hours a month
6.
7.
8. Team Focus:
Build the Best Global Streaming Product
Three aspects of the Streaming Product:
• Non-Member
• Discovery
• Streaming
9. Netflix API : Key Responsibilities
• Broker data between services and Devices
• Provide features and business logic
• Maintain a resilient front-door
• Scale the system
• Maintain high velocity
10. Netflix API : Key Responsibilities
• Broker data between services and Devices
• Provide features and business logic
• Maintain a resilient front-door
• Scale the system
• Maintain high velocity
12. Data Gathering
Data Formatting
Data Delivery
Security
Authorization
Authentication
System Scaling
Discoverability
Data Consistency
Translations
Throttling
Orchestration
APIs Do
Lots of Things!
These are some of the
many things APIs do.
13. Data Gathering
Data Formatting
Data Delivery
Security
Authorization
Authentication
System Scaling
Discoverability
Data Consistency
Translations
Throttling
Orchestration
APIs Do
Lots of Things!
These three are at the core.
All others ultimately
support them.
14. Definitions
• Data Gathering
– Retrieving the requested data from one or many local
or remote data sources
• Data Formatting
– Preparing a structured payload to the requesting
agent
• Data Delivery
– Delivering the structured payload to the requesting
agent
20. Why do most API providers provide
everything?
• API design tends to be easier for teams closer
to the source
• Centralized API functions makes them easier
to support
• Many APIs have a large set of unknown and
external developers
21. Why do most API providers provide
everything?
• API design tends to be easier for teams closer
to the source
• Centralized API functions makes them easier
to support
• Many APIs have a large set of unknown and
external developers
22. Data Gathering Data Formatting Data Delivery
API Consumer
API Provider
Separation of Concerns
To be a better provider, the API should address the
separation of concerns of the three core functions
23. Data Gathering Data Formatting Data Delivery
API Consumer
Don’t care how data
is gathered, as long
as it is gathered
API Provider
Care a lot about
how the data is
gathered
Separation of Concerns
24. Data Gathering Data Formatting Data Delivery
API Consumer
Don’t care how data
is gathered, as long
as it is gathered
Each consumer cares a
lot about the format
for that specific use
API Provider
Care a lot about
how the data is
gathered
Only cares about the
format to the extent it
is easy to support
Separation of Concerns
25. Data Gathering Data Formatting Data Delivery
API Consumer
Don’t care how data
is gathered, as long
as it is gathered
Each consumer cares a
lot about the format
for that specific use
Each consumer cares a
lot about how payload
is delivered
API Provider
Care a lot about
how the data is
gathered
Only cares about the
format to the extent it
is easy to support
Only cares about
delivery method to the
extent it is easy to
support
Separation of Concerns
27. Should you consider alternatives to
one-size-fits-all API model?
Ingredients:
• Small number of targeted API consumers is top priority
• Close relationships between these API consumers and
the API team
• Increasing divergence of needs across the top priority
API consumers
• Strong desire by the API consumers for more optimized
interactions with the API
• High value proposition for the company providing the
API to make these API consumers as effective as
possible
52. Because we are no longer
catering to LSUDs
We can now focus on building a
business on which all of Netflix
can operate
53. Netflix API : Key Responsibilities
• Broker data between services and Devices
• Provide features and business logic
• Maintain a resilient front-door
• Scale the system
• Maintain high velocity
94. Netflix API : Key Responsibilities
• Broker data between services and Devices
• Provide features and business logic
• Maintain a resilient front-door
• Scale the system
• Maintain high velocity
111. Netflix API : Key Responsibilities
• Broker data between services and Devices
• Provide features and business logic
• Maintain a resilient front-door
• Scale the system
• Maintain high velocity
119. Single Canary Instance
To Test New Code with Production Traffic
(around 1% or less of traffic)
Current Code
In Production
API Requests from
the Internet
121. Single Canary Instance
To Test New Code with Production Traffic
(around 1% or less of traffic)
Current Code
In Production
API Requests from
the Internet
Error!
Netflix strives to be the global streaming video leader for TV shows and movies
We now have more than 44 million global subscribers in more than 40 countries
Those subscribers consume more than a billion hours of streaming video a month which accounts for about 33% of the peak Internet traffic in the US.
Those subscribers consume more than a billion hours of streaming video a month which accounts for about 33% of the peak Internet traffic in the US.
Our 44 million Netflix subscribers are watching shows and movies on virtually any device that has a streaming video screen. We are now on more than 1,000 different device types.
The subscribers can watch our original shows like Emmy-winning House of Cards.
Within this world, the Edge Engineering team focuses on these three aspects of the streaming product.
Most companies focus on a small handful of device implementations, most notably Android and iOS devices.
At Netflix, we have more than 1,000 different device types that we support. Across those devices, there is a high degree of variability. As a result, we have seen inefficiencies and problems emerge across our implementations. Those issues also translate into issues with the API interaction.
For example, screen size could significantly affect what the API should deliver to the UI. TVs with bigger screens that can potentially fit more titles and more metadata per title than a mobile phone. Do we need to send all of the extra bits for fields or items that are not needed, requiring the device itself to drop items on the floor? Or can we optimize the deliver of those bits on a per-device basis?
Different devices have different controlling functions as well. For devices with swipe technologies, such as the iPad, do we need to pre-load a lot of extra titles in case a user swipes the row quickly to see the last of 500 titles in their queue? Or for up-down-left-right controllers, would devices be more optimized by fetching a few items at a time when they are needed? Other devices support voice or hand gestures or pointer technologies. How might those impact the user experience and therefore the metadata needed to support them?
The technical specs on these devices differ greatly. Some have significant memory space while others do not, impacting how much data can be handled at a given time. Processing power and hard-drive space could also play a role in how the UI performs, in turn potentially influencing the optimal way for fetching content from the API. All of these differences could result in different potential optimizations across these devices.
Many UI teams needing metadata means many requests to the API team. In the one-size-fits-all API world, we essentially needed to funnel these requests and then prioritize them. That means that some teams would need to wait for API work to be done. It also meant that, because they all shared the same endpoints, we were often adding variations to the endpoints resulting in a more complex system as well as a lot of spaghetti code. Make teams wait due to prioritization was exacerbated by the fact that tasks took longer because the technical debt was increasing, causing time to build and test to increase. Moreover, many of the incoming requests were asking us to do more of the same kinds of customizations. This created a spiral that would be very difficult to break out of…
Many other companies have seen similar issues and have introduced orchestration layers that enable more flexible interaction models.
Odata, HYQL, ql.io, rest.li and others are examples of orchestration layers. They address the same problems that we have seen, but we have approached the solution in a very different way.
We evolved our discussion towards what ultimately became a discussion between resource-based APIs and experience-based APIs.
The original OSFA API was very resource oriented with granular requests for specific data, delivering specific documents in specific formats.
The interaction model looked basically like this, with (in this example) the PS3 making many calls across the network to the OSFA API. The API ultimately called back to dependent services to get the corresponding data needed to satisfy the requests.
In this mode, there is a very clear divide between the Client Code and the Server Code. That divide is the network border.
And the responsibilities have the same distribution as well. The Client Code handles the rendering of the interface (as well as asking the server for data). The Server Code is responsible of gathering, formatting and delivering the data to the UIs.
And ultimately, it works. The PS3 interface looks like this and was populated by this interaction model.
But we believe this is not the optimal way to handle it. In fact, assembling a UI through many resource-based API calls is akin to pointillism paintings. The picture looks great when fully assembled, but it is done by assembling many points put together in the right way.
We have decided to pursue an experience-based approach instead. Rather than making many API requests to assemble the PS3 home screen, the PS3 will potentially make a single request to a custom, optimized endpoint.
In an experience-based interaction, the PS3 can potentially make a single request across the network border to a scripting layer (currently Groovy), in this example to provide the data for the PS3 home screen. The call goes to a very specific, custom endpoint for the PS3 or for a shared UI. The Groovy script then interprets what is needed for the PS3 home screen and triggers a series of calls to the Java API running in the same JVM as the Groovy scripts. The Java API is essentially a series of methods that individually know how to gather the corresponding data from the dependent services. The Java API then returns the data to the Groovy script who then formats and delivers the very specific data back to the PS3.
We also introduced RxJava into this layer to improve our ability to handle concurrency and callbacks. RxJava is open source in our github repository.
In this model, the border between Client Code and Server Code is no longer the network border. It is now back on the server. The Groovy is essentially a client adapter written by the client teams.
And the distribution of work changes as well. The client teams continue to handle UI rendering, but now are also responsible for the formatting and delivery of content. The API team, in terms of the data side of things, is responsible for the data gathering and hand-off to the client adapters. Of course, the API team does many other things, including resiliency, scaling, dependency interactions, etc. This model is essentially a platform for API development.
If resource-based APIs assemble data like pointillism, experience-based APIs assemble data like a photograph. The experience-based approach captures and delivers it all at once.
And as we all know, the bigger the ship, the slower it turns. That was very much the case for Netflix years ago.
To grow to where we knew we needed to be, Netflix aggressively moved to a distributed architecture.
I like to think of this distributed architecture as being shaped like an hourglass…
In the top end of the hourglass, we have our device and UI teams who build out great user experiences on Netflix-branded devices. To put that into perspective, there are a few hundred more device types that we support than engineers at Netflix.
At the bottom end of the hourglass, there are several dozen dependency teams who focus on things like metadata, algorithms, authentication services, A/B test engines, etc.
The API is at the center of the hourglass, acting as a broker of data.
Our distributed architecture, with the number of systems involved, can get quite complicated. Each of these systems talks to a large number of other systems within our architecture.
Assuming each of the services have SLAs of four nines, that results in more than two hours of downtime per month.
And that is if all services maintain four nines!
If it degrades as far as to three nines, that is almost one day per month of downtime!
So, back to the hourglass…
In the old world, the system was vulnerable to such failures. For example, if one of our dependency services fails…
Such a failure could have resulted in an outage in the API.
And that outage likely would have cascaded to have some kind of substantive impact on the devices.
The challenge for the API team is to be resilient against dependency outages, to ultimately insulate Netflix customers from low level system problems and to keep them happy.
To solve this problem, we created Hystrix, as wrapping technology that provides fault tolerance in a distributed environment. Hystrix is also open source and available at our github repository.
To achieve this, we implemented a series of circuit breakers for each library that we depend on. Each circuit breaker controls the interaction between the API and that dependency. This image is a view of the dependency monitor that allows us to view the health and activity of each dependency. This dashboard is designed to give a real-time view of what is happening with these dependencies (over the last two minutes). We have other dashboards that provide insight into longer-term trends, day-over-day views, etc.
This is a view of a single circuit.
This circle represents the call volume and health of the dependency over the last 10 seconds. This circle is meant to be a visual indicator for health. The circle is green for healthy, yellow for borderline, and red for unhealthy. Moreover, the size of the circle represents the call volumes, where bigger circles mean more traffic.
The blue line represents the traffic trends over the last two minutes for this dependency.
The green number shows the number of successful calls to this dependency over the last two minutes.
The yellow number shows the number of latent calls into the dependency. These calls ultimately return successful responses, but slower than expected.
The blue number shows the number of calls that were handled by the short-circuited fallback mechanisms. That is, if the circuit gets tripped, the blue number will start to go up.
The orange number shows the number of calls that have timed out, resulting in fallback responses.
The purple number shows the number of calls that fail due to queuing issues, resulting in fallback responses.
The red number shows the number of exceptions, resulting in fallback responses.
The error rate is calculated from the total number of error and fallback responses divided by the total number calls handled.
If the error rate exceeds a certain number, the circuit to the fallback scenario is automatically opened. When it returns below that threshold, the circuit is closed again.
The dashboard also shows host and cluster information for the dependency.
As well as information about our SLAs.
So, going back to the engineering diagram…
If that same service fails today…
We simply disconnect from that service.
And replace it with an appropriate fallback. The fallback, ideally is a slightly degrade, but useful offering. If we cannot get that, however, we will quickly provide a 5xx response which will help the systems shed load rather than queue things up (which could eventually cause the system as a whole to tip over).
This will keep our customers happy, even if the experience may be slightly degraded. It is important to note that different dependency libraries have different fallback scenarios. And some are more resilient than others. But the overall sentiment here is accurate at a high level.
In addition to the migration to a distributed architecture, we also aggressively moved out of data centers…
And into the cloud.
Instead of spending in data centers, we spend out time in tools such as Asgard, created by Netflix staff, to help us manage our instance types and counts in AWS. Asgard is available in our open source repository at github.
Another feature afforded to us through AWS to help us scale is Autoscaling. This is the Netflix API request rates over a span of time. The red line represents a potential capacity needed in a data center to ensure that the spikes could be handled without spending a ton more than is needed for the really unlikely scenarios.
Through autoscaling, instead of buying new servers based on projected spikes in traffic and having systems administrators add them to the farm, the cloud can dynamically and automatically add and remove servers based on need.
To offset these limitations, we created Scryer (not yet open sourced, but in production at Netflix).
Instead of reacting to real-time metrics, like load average, to increase/decrease the instance count, we can look at historical patterns in our traffic to figure out what will be needed BEFORE it is needed. We believed we could write algorithms to predict the needs.
This is the result of the algorithms we created for the predictions. The prediction closely matches the actual traffic.
Going global has a different set of scaling challenges. AWS enables us to add instances in new regions that are closer to our customers.
To help us manage our traffic across regions, as well as within given regions, we created Zuul. Zuul is open source in our github repository.
Zuul does a variety of things for us. Zuul fronts our entire streaming application as well as a range of other services within our system.
Hystrix and other techniques throughout our engineering organization help keep things resilient. We also have an army of tools that introduce failures to the system which will help us identify problems before they become really big problems.
Hystrix and other techniques throughout our engineering organization help keep things resilient. We also have an army of tools that introduce failures to the system which will help us identify problems before they become really big problems.
The army is the Simian Army, which is a fleet of monkeys who are designed to do a variety of things, in an automated way, in our cloud implementation. Chaos Monkey, for example, periodically terminates AWS instances in production to see how the system as a whole will respond once that server disappears. Latency Monkey introduces latencies and errors into a system to see how it responds. The system is too complex to know how things will respond in various circumstances, so the monkeys expose that information to us in a variety of ways. The monkeys are also available in our open source github repository.
Again, the dependency chains in our system are quite complicated.
That is a lot of change in the system!
As a result, our philosophy is to act fast (ie. get code into production as quickly as possible), then react fast (ie. response to issues quickly as they arise).
Two such examples are canary deployments and what we call red/black deployments.
The canary deployments are comparable to canaries in coal mines. We have many servers in production running the current codebase. We will then introduce a single (or perhaps a few) new server(s) into production running new code. Monitoring the canary servers will show what the new code will look like in production.
If the canary encounters problems, it will register in any number of ways. The problems will be determined based on a comprehensive set of tools that will automatically perform health analysis on the canary.
The health of the canary is automated as well, comparing its metrics against the fleet of production servers.
If the canary encounters problems, it will register in any number of ways. The problems will be determined based on a comprehensive set of tools that will automatically perform health analysis on the canary.
If the canary shows errors, we pull it/them down, re-evaluate the new code, debug it, etc.
We will then repeat the process until the analysis of canary servers look good.
We will then repeat the process until the analysis of canary servers look good.
We also use Zuul to funnel varying degrees of traffic to the canaries to evaluate how much load the canary can take relative to the current production instances. If the RPS, for example, drops, the canary may fail the Zuul stress test.
If the new code looks good in the canary, we can then use a technique that we call red/black deployments to launch the code. Start with red, where production code is running. Fire up a new set of servers (black) equal to the count in red with the new code.
Then switch the pointer to have external requests point to the black servers. Sometimes, however, we may find an error in the black cluster that was not detected by the canary. For example, some issues can only be seen with full load.
Then switch the pointer to have external requests point to the black servers. Sometimes, however, we may find an error in the black cluster that was not detected by the canary. For example, some issues can only be seen with full load.
If a problem is encountered from the black servers, it is easy to rollback quickly by switching the pointer back to red. We will then re-evaluate the new code, debug it, etc.
Once we have debugged the code, we will put another canary up to evaluate the new changes in production.
And we will stress the canary again…
If the new code looks good in the canary, we can then bring up another set of servers with the new code.
Then we will switch production traffic to the new code.
If everything still looks good, we disable the red servers and the new code becomes the new red servers.
All of the open source components discussed here, as well as many others, can be found at the Netflix github repository.