SlideShare a Scribd company logo
The new Netflix API
Why more complexity must lead to more
simplicity
Katharina Probst
DevNexus 2017
Js
(mostly)
java
Client lib A
Client lib B
Client lib C
Client lib N
Client lib Y
Client lib Z
...
...
Netflix
Micro-
services
scripts
scripts
scripts
scripts
...
scripts
scripts
scripts
scripts
Network
boundary API Server JVM
groovy
Network
boundary
Today’s architecture
Network
boundary
Gateway
What is the Netflix
Raison d’Être
Is the API just one gigantic translation layer?
Is it a routing layer?
If it’s too complex, can we just get rid of it?
Raison d’Être.
1. Orchestration
2. Availability protection
3. Abstraction
Raison d’Être
1. Orchestration
Simple example: search
RelatedTerms
People
Titles
Search request → response
● Search services provides related search terms
● Search service provides IDs for videos and people
○ IDs depend on various factors, e.g., different
catalogs in different countries
● For each ID, we need metadata
○ Titles
○ Images
○ Names
○ Ratings
○ etc.
● ..., which depend on
○ Country
○ A/B tests user is in
○ etc.
Response:
❏ Hydrated videos
❏ People names
❏ Query suggestions
Orchestration
● Own order of operations
● Provide whatever info clients/services need
○ From other clients/libraries/services
○ From request
● Merge partial results
● Filter results
● Retrieve more info if necessary
● Support mutations (e.g., profile switch)
● Support complex transactions in a limited way
2. Availability protection
Prevent this as much as possible
What do customers want?
● No personalized recommendations, or no ability to stream?
● No search, or no ability to continue watching the movie you started last night?
● No cutting-edge A/B experiment experience, or no ability to stream?
Top priority: customer experience
● Top priority of top priority: customer can stream videos
● This means API cannot go down entirely
○ If it does, we have an outage
● But some services are not critical to this mission
○ A/B - if we don’t know what A/B tests you’re in, you can still get the default
experience
○ Search - if you can’t search, you can still browse
Exposure to failures
● As your app grows, your set of dependencies is much more likely to get
bigger, not smaller
● Overall uptime = (Dep uptime)^(num deps)
● Fault-tolerance pattern as a library
● Provides operational insights in real-time
● Automatic load-shedding under pressure
Hystrix
Search client lib
Client lib B
Ratings client lib
Client lib N
Cust client lib
Client lib Z
...
...
scripts
scripts
scripts
scripts
...
scripts
scripts
scripts
scripts
Network
boundary
Network
boundary
Availability protection
Search
Ratings
Customers
...
Network
boundary
Gateway
API
Search client lib
Client lib B
Ratings client lib
Client lib N
Cust client lib
Client lib Z
...
...
scripts
scripts
scripts
scripts
...
scripts
scripts
scripts
scripts
Network
boundary
Network
boundary
Availability protection
Search
Ratings
Customers
...
Network
boundary
Gateway
API
Search client lib
Client lib B
Ratings client lib
Client lib N
Cust client lib
Client lib Z
...
...
scripts
scripts
scripts
scripts
...
scripts
scripts
scripts
scripts
Network
boundary
Network
boundary
Availability protection
Search
Ratings
Customers
...
Network
boundary
Gateway
API
Search client lib
Client lib B
Ratings client lib
Client lib N
Cust client lib
Client lib Z
...
...
scripts
scripts
scripts
scripts
...
scripts
scripts
scripts
scripts
Network
boundary
Network
boundary
If you don’t plan for failure
Search
Ratings
Customers
...
Network
boundary
Gateway
API
Search client lib
Client lib B
Ratings client lib
Client lib N
Cust client lib
Client lib Z
...
...
scripts
scripts
scripts
scripts
...
scripts
scripts
scripts
scripts
Network
boundary
Network
boundary
If you do plan for failure
Search
Ratings
Customers
...
Network
boundary
Gateway
API
No search results >>
no Netflix
Search client lib
Client lib B
Ratings client lib
Client lib N
Cust client lib
Client lib Z
...
...
scripts
scripts
scripts
scripts
...
scripts
scripts
scripts
scripts
Network
boundary
Network
boundary
Fallbacks
Search
Ratings
Customers
...
Network
boundary
Gateway
API
Return static or stale
rating
return getRatings(id);
How to handle errors
try {
return getRatings(id);
} catch (Exception ex) {
//static value
return null;
}
How to handle errors
try {
return getRatings(id);
} catch (Exception ex) {
//TODO What to return here?
}
How to handle errors
Handle errors with fallbacks
● Some options for fallbacks
○ Static value
○ Value from in-memory
○ Value from cache
○ Value from network
○ Throw
○ Code
● Make error-handling explicit
● Applications have to work in the presence of either fallbacks or rethrown
exceptions
● Throttling
● Retries
● Timeouts
● Canaries
● Regional rollouts
● Traffic shifting
● Outlier detection (and elimination)
● Advanced load balancing
Availability protection beyond Hystrix
3. Abstraction
Abstraction goals
● Shield all device teams from every single mid-tier change … at least for a time.
Allows us to move more independently
● Shield all device teams from every single platform/infrastructure change
● Provide APIs not provided by downstream services
○ Find all movies that...
○ Length of movie
● Implementation flexibility, e.g.,
○ Caching
○ Batch APIs
Abstraction challenges
● Tech debt
● Device teams can have black-box view (“api == cloud”)
● But isn’t the API team the bottleneck?
○ Yes, sometimes. But organizational structure makes this less of a problem
than m mid-tier teams dealing with n device teams
● But: separation of concerns
Server-side logic
Client lib A
Client lib B
Client lib C
Client lib N
Client lib Y
Client lib Z
...
...
Netflix
Micro-
services
scripts
scripts
scripts
scripts
...
scripts
scripts
scripts
scripts
Network
boundary
~2100 active
Network
boundary
Reminder: Today’s architecture
Network
boundary
Gateway
API
Device teams write server-side logic
● Decoupling teams → better velocity
● UI teams are empowered to
○ Change presentation
○ Filter
○ Add users to A/B tests, which then leads to e.g., different layout.
What if we didn’t have an
API?
Client lib A
Client lib B
Client lib C
Client lib N
Client lib Y
Client lib Z
...
...
Netflix
Micro-
services
scripts
scripts
scripts
scripts
...
scripts
scripts
scripts
scripts
Network
boundary
Network
boundary
What if? Implications for device teams
Network
boundary
Gateway
Device teams own
client-side
applications …
Client lib A
Client lib B
Client lib C
Client lib N
Client lib Y
Client lib Z
...
...
Netflix
Micro-
services
scripts
scripts
scripts
scripts
...
scripts
scripts
scripts
scripts
Network
boundary
Network
boundary
What if? Implications for device teams
Network
boundary
Gateway
...and groovy scripts
What if? Implications for device teams
● Each device team would have to own
○ Orchestration
○ Frequent dependency updates (currently done (attempted) daily)
○ Implement higher level APIs (all movies that…)
○ Fallbacks and other resiliency protection (e.g., timeouts, retries)
● Recent example
○ Library upgrade caused a lot of NPEs -- why?
○ Worked with team to find out why
○ When fixed, no more NPEs, but instead performance degradation
● Should all teams be dealing with this?
Client lib A
Client lib B
Client lib C
Client lib N
Client lib Y
Client lib Z
...
...
Netflix
Micro-
services
scripts
scripts
scripts
scripts
...
scripts
scripts
scripts
scripts
Network
boundary
Network
boundary
What if? Implications for service teams
Network
boundary
Gateway
Service teams own
services...
Client lib A
Client lib B
Client lib C
Client lib N
Client lib Y
Client lib Z
...
...
Netflix
Micro-
services
scripts
scripts
scripts
scripts
...
scripts
scripts
scripts
scripts
Network
boundary
Network
boundary
What if? Implications for service teams
Network
boundary
Gateway
...and client libraries
What if? Implications for service teams
● Can only make breaking changes if all device teams who use their service
upgrade
● Don’t get resiliency protection (e.g., timeouts, load balancing, retries, fallbacks)
unless all device teams who use their service provide it
● Should all teams be dealing with this?
What if? Implications for Netflix
● Lower velocity due to tight coupling between many mid-tier teams and many
device teams
OR:
THE DOWNSIDE OF CENTRALIZATION
Where are we today?
● Principle: don’t repeat logic
○ It’s better to do it once in API than do it n times for n devices.
● Principle is good, but leads to complexity
What complexity
challenges to we have?
Complexity challenges
● Frequent (not always canaried) updates to a critical service in production
● Difficulty of debugging (esp. for groovy script writers)
● Slow server startup times
● Lack of operational insights into script resource consumption
● Difficulty of performance profiling
● Lack of feedback loop
● Decoupled code versioning and transitive dependencies
Where are we going next?
Top priorities
● Move groovy scripts out
● Split up API
Client lib A
Client lib B
Client lib C
Client lib N
Client lib Y
Client lib Z
...
...
Netflix
Micro-
services
Network
boundary
...
Network
boundary
New architecture: Edge PaaS
Network
boundary
Network
boundary
Gate-
way
EAS
Network
boundary Client lib A
Client lib B
Client lib C
Client lib N
Client lib Y
Client lib Z
...
...
Node app NodeQuark
Node app NodeQuark
Node app NodeQuark
Node app NodeQuark
Titus
Network
boundary
Network
boundary
Netflix
Micro-
services
Network
boundary
...
New architecture: Edge PaaS
Network
boundary
Gate-
way
EAS
Network
boundary
Node app NodeQuark
Node app NodeQuark
Node app NodeQuark
Node app NodeQuark
Titus
Edge Auth Service
● Auth
termination
● Centralized
place for
auth
Edge PaaS:
● Platform for node scripts
● Developer tooling for entire SDLC
● Remote API with optimized data access (Falcor)
Client lib A
Client lib B
Client lib C
Client lib N
Client lib Y
Client lib Z
...
...
Client lib A
Client lib B
Client lib C
Client lib N
Client lib Y
Client lib Z
...
...
Two APIs
DNAClient A
...
Network
boundary
...
Network
boundary
Two (or more) APIs
Network
boundary
Network
boundary
Gate-
way
EAS
Network
boundary
Node app NodeQuark
Node app NodeQuark
Node app NodeQuark
Node app NodeQuark
Titus
PB Service A
PB Service B
PB Service Z
...
DNAClient B
DNAClient Z
Shared Client C
Shared Client A
...
PB Client B
PB Client Z
PB Client C
PB Service C
DNA Service A
DNA Service B
DNA Service Z
...
DNA Service C
Shared Service A
Shared Service B
Shared Service Z
...
Split API by
function
NodeQuark Platform
java
Netflix
Micro-
services
Network
boundary
...
Network
boundary
NodeQuark Platform
Network
boundary
Network
boundary
Zuul
EAS
Network
boundary
Node app NodeQuark
Node app NodeQuark
Node app NodeQuark
Node app NodeQuark
Titus
Client lib A
Client lib B
Client lib C
Client lib N
Client lib Y
Client lib Z
...
...
Client lib A
Client lib B
Client lib C
Client lib N
Client lib Y
Client lib Z
...
...
Platform for node scripts
Edge PaaS: Node Platform
● Node apps run in containers on Titus platform
● Node Platform provides
○ Integration into Netflix ecosystem (e.g., discovery)
○ Logging
○ Dashboards, metrics out of the box with option to customize
○ Support for mocking and testing
● Titus provides
○ Scheduling
○ Autoscaling
Developer experience
java
Netflix
Micro-
services
Network
boundary
...
Network
boundary
New architecture: Edge PaaS
Network
boundary
Network
boundary
Gate-
way
EAS
Network
boundary
Node app NodeQuark
Node app NodeQuark
Node app NodeQuark
Node app NodeQuark
Titus
Client lib A
Client lib B
Client lib C
Client lib N
Client lib Y
Client lib Z
...
...
Client lib A
Client lib B
Client lib C
Client lib N
Client lib Y
Client lib Z
...
...
Developer tooling for
entire SDLC
Edge PaaS: Developer tooling
● Command line tool for node apps
○ Setup
○ Starting apps
○ Deploying apps
● Local development and debugging of node apps
● UI for lifecycle management, e.g., version management
● One-click rollouts, one-click rollbacks
● Versioning
Remote API
Netflix
Micro-
services
Network
boundary
...
Network
boundary
New architecture: Edge PaaS
Network
boundary
Network
boundary
Zuul
EAS
Network
boundary
Node app NodeQuark
Node app NodeQuark
Node app NodeQuark
Node app NodeQuark
Titus
Remote API with
optimized data access
Client lib A
Client lib B
Client lib C
Client lib N
Client lib Y
Client lib Z
...
...
Client lib A
Client lib B
Client lib C
Client lib N
Client lib Y
Client lib Z
...
...
Edge PaaS: Remote API
● API still takes care of
○ Orchestration
○ Resiliency protection
○ Abstraction
● Optimized access with Falcor
○ “RESTful composition” with caching
● Binary transport
● Future: channel support
Greater simplicity
Isolated failures:
Scripts don’t affect each other (usually)
API
Temporarily
unavailable!
Independent root causing
API
Latency
spike after
push:
150ms
Average
latency:
10ms
Independent autoscaling
API
Independent insights
API
Average
latency:
50ms
Average
latency:
10ms
Better regression/performance testing
API
Tests not
affected by
other scripts
eating up
resources
on the same
JVM
Conclusion
Complexity and simplicity
● Product has become much more complex
○ Scripts (more scripts, more complex scripts)
○ Features
○ Number of downstream services to integrate
○ More personalization
○ etc.
● Complexity of API service is high → Need to optimize for simplicity
now
○ Process isolation
○ Cleaner developer experience
END

More Related Content

What's hot

Microservices with Kafka Ecosystem
Microservices with Kafka EcosystemMicroservices with Kafka Ecosystem
Microservices with Kafka Ecosystem
Guido Schmutz
 
Micro services Architecture
Micro services ArchitectureMicro services Architecture
Micro services Architecture
Araf Karsh Hamid
 
How Apache Kafka® Works
How Apache Kafka® WorksHow Apache Kafka® Works
How Apache Kafka® Works
confluent
 
DBA Fundamentals Group: Continuous SQL with Kafka and Flink
DBA Fundamentals Group: Continuous SQL with Kafka and FlinkDBA Fundamentals Group: Continuous SQL with Kafka and Flink
DBA Fundamentals Group: Continuous SQL with Kafka and Flink
Timothy Spann
 
Apache Kafka Introduction
Apache Kafka IntroductionApache Kafka Introduction
Apache Kafka Introduction
Amita Mirajkar
 
Apache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals ExplainedApache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals Explained
confluent
 
Apache Flink, AWS Kinesis, Analytics
Apache Flink, AWS Kinesis, Analytics Apache Flink, AWS Kinesis, Analytics
Apache Flink, AWS Kinesis, Analytics
Araf Karsh Hamid
 
Kafka 101
Kafka 101Kafka 101
Kafka 101
Clement Demonchy
 
Introduction to memcached
Introduction to memcachedIntroduction to memcached
Introduction to memcached
Jurriaan Persyn
 
Service Mesh with Apache Kafka, Kubernetes, Envoy, Istio and Linkerd
Service Mesh with Apache Kafka, Kubernetes, Envoy, Istio and LinkerdService Mesh with Apache Kafka, Kubernetes, Envoy, Istio and Linkerd
Service Mesh with Apache Kafka, Kubernetes, Envoy, Istio and Linkerd
Kai Wähner
 
Let's build Developer Portal with Backstage
Let's build Developer Portal with BackstageLet's build Developer Portal with Backstage
Let's build Developer Portal with Backstage
Opsta
 
Benefits of Stream Processing and Apache Kafka Use Cases
Benefits of Stream Processing and Apache Kafka Use CasesBenefits of Stream Processing and Apache Kafka Use Cases
Benefits of Stream Processing and Apache Kafka Use Cases
confluent
 
Stream Processing with Apache Kafka and .NET
Stream Processing with Apache Kafka and .NETStream Processing with Apache Kafka and .NET
Stream Processing with Apache Kafka and .NET
confluent
 
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...
GetInData
 
Building A Modern Data Analytics Architecture on AWS
Building A Modern Data Analytics Architecture on AWSBuilding A Modern Data Analytics Architecture on AWS
Building A Modern Data Analytics Architecture on AWS
Amazon Web Services
 
Apache Kafka 0.8 basic training - Verisign
Apache Kafka 0.8 basic training - VerisignApache Kafka 0.8 basic training - Verisign
Apache Kafka 0.8 basic training - Verisign
Michael Noll
 
Introduction to Nginx
Introduction to NginxIntroduction to Nginx
Introduction to Nginx
Knoldus Inc.
 
Serverless Kafka and Spark in a Multi-Cloud Lakehouse Architecture
Serverless Kafka and Spark in a Multi-Cloud Lakehouse ArchitectureServerless Kafka and Spark in a Multi-Cloud Lakehouse Architecture
Serverless Kafka and Spark in a Multi-Cloud Lakehouse Architecture
Kai Wähner
 
Hadoop REST API Security with Apache Knox Gateway
Hadoop REST API Security with Apache Knox GatewayHadoop REST API Security with Apache Knox Gateway
Hadoop REST API Security with Apache Knox GatewayDataWorks Summit
 
WTF is GitOps and Why You Should Care?
WTF is GitOps and Why You Should Care?WTF is GitOps and Why You Should Care?
WTF is GitOps and Why You Should Care?
Weaveworks
 

What's hot (20)

Microservices with Kafka Ecosystem
Microservices with Kafka EcosystemMicroservices with Kafka Ecosystem
Microservices with Kafka Ecosystem
 
Micro services Architecture
Micro services ArchitectureMicro services Architecture
Micro services Architecture
 
How Apache Kafka® Works
How Apache Kafka® WorksHow Apache Kafka® Works
How Apache Kafka® Works
 
DBA Fundamentals Group: Continuous SQL with Kafka and Flink
DBA Fundamentals Group: Continuous SQL with Kafka and FlinkDBA Fundamentals Group: Continuous SQL with Kafka and Flink
DBA Fundamentals Group: Continuous SQL with Kafka and Flink
 
Apache Kafka Introduction
Apache Kafka IntroductionApache Kafka Introduction
Apache Kafka Introduction
 
Apache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals ExplainedApache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals Explained
 
Apache Flink, AWS Kinesis, Analytics
Apache Flink, AWS Kinesis, Analytics Apache Flink, AWS Kinesis, Analytics
Apache Flink, AWS Kinesis, Analytics
 
Kafka 101
Kafka 101Kafka 101
Kafka 101
 
Introduction to memcached
Introduction to memcachedIntroduction to memcached
Introduction to memcached
 
Service Mesh with Apache Kafka, Kubernetes, Envoy, Istio and Linkerd
Service Mesh with Apache Kafka, Kubernetes, Envoy, Istio and LinkerdService Mesh with Apache Kafka, Kubernetes, Envoy, Istio and Linkerd
Service Mesh with Apache Kafka, Kubernetes, Envoy, Istio and Linkerd
 
Let's build Developer Portal with Backstage
Let's build Developer Portal with BackstageLet's build Developer Portal with Backstage
Let's build Developer Portal with Backstage
 
Benefits of Stream Processing and Apache Kafka Use Cases
Benefits of Stream Processing and Apache Kafka Use CasesBenefits of Stream Processing and Apache Kafka Use Cases
Benefits of Stream Processing and Apache Kafka Use Cases
 
Stream Processing with Apache Kafka and .NET
Stream Processing with Apache Kafka and .NETStream Processing with Apache Kafka and .NET
Stream Processing with Apache Kafka and .NET
 
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...
 
Building A Modern Data Analytics Architecture on AWS
Building A Modern Data Analytics Architecture on AWSBuilding A Modern Data Analytics Architecture on AWS
Building A Modern Data Analytics Architecture on AWS
 
Apache Kafka 0.8 basic training - Verisign
Apache Kafka 0.8 basic training - VerisignApache Kafka 0.8 basic training - Verisign
Apache Kafka 0.8 basic training - Verisign
 
Introduction to Nginx
Introduction to NginxIntroduction to Nginx
Introduction to Nginx
 
Serverless Kafka and Spark in a Multi-Cloud Lakehouse Architecture
Serverless Kafka and Spark in a Multi-Cloud Lakehouse ArchitectureServerless Kafka and Spark in a Multi-Cloud Lakehouse Architecture
Serverless Kafka and Spark in a Multi-Cloud Lakehouse Architecture
 
Hadoop REST API Security with Apache Knox Gateway
Hadoop REST API Security with Apache Knox GatewayHadoop REST API Security with Apache Knox Gateway
Hadoop REST API Security with Apache Knox Gateway
 
WTF is GitOps and Why You Should Care?
WTF is GitOps and Why You Should Care?WTF is GitOps and Why You Should Care?
WTF is GitOps and Why You Should Care?
 

Viewers also liked

Evolving the Netflix API
Evolving the Netflix APIEvolving the Netflix API
Evolving the Netflix API
Katharina Probst
 
Microservices at Netflix
Microservices at NetflixMicroservices at Netflix
Microservices at Netflix
Katharina Probst
 
Engineering Manager, Edge Insights @Netflix
Engineering Manager, Edge Insights @NetflixEngineering Manager, Edge Insights @Netflix
Engineering Manager, Edge Insights @Netflix
Sangeeta Narayanan
 
Move Fast;Stay Safe:Developing & Deploying the Netflix API
Move Fast;Stay Safe:Developing & Deploying the Netflix APIMove Fast;Stay Safe:Developing & Deploying the Netflix API
Move Fast;Stay Safe:Developing & Deploying the Netflix API
Sangeeta Narayanan
 
Making Microservices work at Netflix
Making Microservices  work at NetflixMaking Microservices  work at Netflix
Making Microservices work at Netflix
Sangeeta Narayanan
 
QConSF 2014 - How we learned to stop worrying and start deploying the Netflix...
QConSF 2014 - How we learned to stop worrying and start deploying the Netflix...QConSF 2014 - How we learned to stop worrying and start deploying the Netflix...
QConSF 2014 - How we learned to stop worrying and start deploying the Netflix...
Sangeeta Narayanan
 

Viewers also liked (6)

Evolving the Netflix API
Evolving the Netflix APIEvolving the Netflix API
Evolving the Netflix API
 
Microservices at Netflix
Microservices at NetflixMicroservices at Netflix
Microservices at Netflix
 
Engineering Manager, Edge Insights @Netflix
Engineering Manager, Edge Insights @NetflixEngineering Manager, Edge Insights @Netflix
Engineering Manager, Edge Insights @Netflix
 
Move Fast;Stay Safe:Developing & Deploying the Netflix API
Move Fast;Stay Safe:Developing & Deploying the Netflix APIMove Fast;Stay Safe:Developing & Deploying the Netflix API
Move Fast;Stay Safe:Developing & Deploying the Netflix API
 
Making Microservices work at Netflix
Making Microservices  work at NetflixMaking Microservices  work at Netflix
Making Microservices work at Netflix
 
QConSF 2014 - How we learned to stop worrying and start deploying the Netflix...
QConSF 2014 - How we learned to stop worrying and start deploying the Netflix...QConSF 2014 - How we learned to stop worrying and start deploying the Netflix...
QConSF 2014 - How we learned to stop worrying and start deploying the Netflix...
 

Similar to The new Netflix API

AWS Lambda and Serverless framework: lessons learned while building a serverl...
AWS Lambda and Serverless framework: lessons learned while building a serverl...AWS Lambda and Serverless framework: lessons learned while building a serverl...
AWS Lambda and Serverless framework: lessons learned while building a serverl...
Luciano Mammino
 
Netflix Architecture and Open Source
Netflix Architecture and Open SourceNetflix Architecture and Open Source
Netflix Architecture and Open Source
All Things Open
 
The Netflix API Platform for Server-Side Scripting
The Netflix API Platform for Server-Side ScriptingThe Netflix API Platform for Server-Side Scripting
The Netflix API Platform for Server-Side Scripting
Katharina Probst
 
PyConIE 2017 Writing and deploying serverless python applications
PyConIE 2017 Writing and deploying serverless python applicationsPyConIE 2017 Writing and deploying serverless python applications
PyConIE 2017 Writing and deploying serverless python applications
Cesar Cardenas Desales
 
Multi-cluster Kubernetes Networking- Patterns, Projects and Guidelines
Multi-cluster Kubernetes Networking- Patterns, Projects and GuidelinesMulti-cluster Kubernetes Networking- Patterns, Projects and Guidelines
Multi-cluster Kubernetes Networking- Patterns, Projects and Guidelines
Sanjeev Rampal
 
NetflixOSS Meetup S6E1 - Titus & Containers
NetflixOSS Meetup S6E1 - Titus & ContainersNetflixOSS Meetup S6E1 - Titus & Containers
NetflixOSS Meetup S6E1 - Titus & Containers
aspyker
 
PyConIT 2018 Writing and deploying serverless python applications
PyConIT 2018 Writing and deploying serverless python applicationsPyConIT 2018 Writing and deploying serverless python applications
PyConIT 2018 Writing and deploying serverless python applications
Cesar Cardenas Desales
 
Geoscience and Microservices
Geoscience and Microservices Geoscience and Microservices
Geoscience and Microservices
Matthew Gerring
 
Triangle Devops Meetup 10/2015
Triangle Devops Meetup 10/2015Triangle Devops Meetup 10/2015
Triangle Devops Meetup 10/2015
aspyker
 
Polyglot persistence @ netflix (CDE Meetup)
Polyglot persistence @ netflix (CDE Meetup) Polyglot persistence @ netflix (CDE Meetup)
Polyglot persistence @ netflix (CDE Meetup)
Roopa Tangirala
 
Easy Microservices with JHipster - Devoxx BE 2017
Easy Microservices with JHipster - Devoxx BE 2017Easy Microservices with JHipster - Devoxx BE 2017
Easy Microservices with JHipster - Devoxx BE 2017
Deepu K Sasidharan
 
Devoxx Belgium 2017 - easy microservices with JHipster
Devoxx Belgium 2017 - easy microservices with JHipsterDevoxx Belgium 2017 - easy microservices with JHipster
Devoxx Belgium 2017 - easy microservices with JHipster
Julien Dubois
 
Session 01 - Introduction to Java
Session 01 - Introduction to JavaSession 01 - Introduction to Java
Session 01 - Introduction to Java
PawanMM
 
The Netflix API for a global service
The Netflix API for a global serviceThe Netflix API for a global service
The Netflix API for a global service
Katharina Probst
 
OS for AI: Elastic Microservices & the Next Gen of ML
OS for AI: Elastic Microservices & the Next Gen of MLOS for AI: Elastic Microservices & the Next Gen of ML
OS for AI: Elastic Microservices & the Next Gen of ML
Nordic APIs
 
What is a Service Mesh and what can it do for your Microservices
What is a Service Mesh and what can it do for your MicroservicesWhat is a Service Mesh and what can it do for your Microservices
What is a Service Mesh and what can it do for your Microservices
Matt Turner
 
Writing and deploying serverless python applications
Writing and deploying serverless python applicationsWriting and deploying serverless python applications
Writing and deploying serverless python applications
Cesar Cardenas Desales
 
Beginners Node.js
Beginners Node.jsBeginners Node.js
Beginners Node.js
Khaled Mosharraf
 
2017 Microservices Practitioner Virtual Summit: Microservices at Squarespace ...
2017 Microservices Practitioner Virtual Summit: Microservices at Squarespace ...2017 Microservices Practitioner Virtual Summit: Microservices at Squarespace ...
2017 Microservices Practitioner Virtual Summit: Microservices at Squarespace ...
Ambassador Labs
 
Kubernetes @ Squarespace: Kubernetes in the Datacenter
Kubernetes @ Squarespace: Kubernetes in the DatacenterKubernetes @ Squarespace: Kubernetes in the Datacenter
Kubernetes @ Squarespace: Kubernetes in the Datacenter
Kevin Lynch
 

Similar to The new Netflix API (20)

AWS Lambda and Serverless framework: lessons learned while building a serverl...
AWS Lambda and Serverless framework: lessons learned while building a serverl...AWS Lambda and Serverless framework: lessons learned while building a serverl...
AWS Lambda and Serverless framework: lessons learned while building a serverl...
 
Netflix Architecture and Open Source
Netflix Architecture and Open SourceNetflix Architecture and Open Source
Netflix Architecture and Open Source
 
The Netflix API Platform for Server-Side Scripting
The Netflix API Platform for Server-Side ScriptingThe Netflix API Platform for Server-Side Scripting
The Netflix API Platform for Server-Side Scripting
 
PyConIE 2017 Writing and deploying serverless python applications
PyConIE 2017 Writing and deploying serverless python applicationsPyConIE 2017 Writing and deploying serverless python applications
PyConIE 2017 Writing and deploying serverless python applications
 
Multi-cluster Kubernetes Networking- Patterns, Projects and Guidelines
Multi-cluster Kubernetes Networking- Patterns, Projects and GuidelinesMulti-cluster Kubernetes Networking- Patterns, Projects and Guidelines
Multi-cluster Kubernetes Networking- Patterns, Projects and Guidelines
 
NetflixOSS Meetup S6E1 - Titus & Containers
NetflixOSS Meetup S6E1 - Titus & ContainersNetflixOSS Meetup S6E1 - Titus & Containers
NetflixOSS Meetup S6E1 - Titus & Containers
 
PyConIT 2018 Writing and deploying serverless python applications
PyConIT 2018 Writing and deploying serverless python applicationsPyConIT 2018 Writing and deploying serverless python applications
PyConIT 2018 Writing and deploying serverless python applications
 
Geoscience and Microservices
Geoscience and Microservices Geoscience and Microservices
Geoscience and Microservices
 
Triangle Devops Meetup 10/2015
Triangle Devops Meetup 10/2015Triangle Devops Meetup 10/2015
Triangle Devops Meetup 10/2015
 
Polyglot persistence @ netflix (CDE Meetup)
Polyglot persistence @ netflix (CDE Meetup) Polyglot persistence @ netflix (CDE Meetup)
Polyglot persistence @ netflix (CDE Meetup)
 
Easy Microservices with JHipster - Devoxx BE 2017
Easy Microservices with JHipster - Devoxx BE 2017Easy Microservices with JHipster - Devoxx BE 2017
Easy Microservices with JHipster - Devoxx BE 2017
 
Devoxx Belgium 2017 - easy microservices with JHipster
Devoxx Belgium 2017 - easy microservices with JHipsterDevoxx Belgium 2017 - easy microservices with JHipster
Devoxx Belgium 2017 - easy microservices with JHipster
 
Session 01 - Introduction to Java
Session 01 - Introduction to JavaSession 01 - Introduction to Java
Session 01 - Introduction to Java
 
The Netflix API for a global service
The Netflix API for a global serviceThe Netflix API for a global service
The Netflix API for a global service
 
OS for AI: Elastic Microservices & the Next Gen of ML
OS for AI: Elastic Microservices & the Next Gen of MLOS for AI: Elastic Microservices & the Next Gen of ML
OS for AI: Elastic Microservices & the Next Gen of ML
 
What is a Service Mesh and what can it do for your Microservices
What is a Service Mesh and what can it do for your MicroservicesWhat is a Service Mesh and what can it do for your Microservices
What is a Service Mesh and what can it do for your Microservices
 
Writing and deploying serverless python applications
Writing and deploying serverless python applicationsWriting and deploying serverless python applications
Writing and deploying serverless python applications
 
Beginners Node.js
Beginners Node.jsBeginners Node.js
Beginners Node.js
 
2017 Microservices Practitioner Virtual Summit: Microservices at Squarespace ...
2017 Microservices Practitioner Virtual Summit: Microservices at Squarespace ...2017 Microservices Practitioner Virtual Summit: Microservices at Squarespace ...
2017 Microservices Practitioner Virtual Summit: Microservices at Squarespace ...
 
Kubernetes @ Squarespace: Kubernetes in the Datacenter
Kubernetes @ Squarespace: Kubernetes in the DatacenterKubernetes @ Squarespace: Kubernetes in the Datacenter
Kubernetes @ Squarespace: Kubernetes in the Datacenter
 

Recently uploaded

GraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph TechnologyGraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph Technology
Neo4j
 
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
Juraj Vysvader
 
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Mind IT Systems
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
informapgpstrackings
 
Graphic Design Crash Course for beginners
Graphic Design Crash Course for beginnersGraphic Design Crash Course for beginners
Graphic Design Crash Course for beginners
e20449
 
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.ILBeyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Natan Silnitsky
 
Pro Unity Game Development with C-sharp Book
Pro Unity Game Development with C-sharp BookPro Unity Game Development with C-sharp Book
Pro Unity Game Development with C-sharp Book
abdulrafaychaudhry
 
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, BetterWebinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
XfilesPro
 
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptxTop Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
rickgrimesss22
 
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoamOpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
takuyayamamoto1800
 
Text-Summarization-of-Breaking-News-Using-Fine-tuning-BART-Model.pptx
Text-Summarization-of-Breaking-News-Using-Fine-tuning-BART-Model.pptxText-Summarization-of-Breaking-News-Using-Fine-tuning-BART-Model.pptx
Text-Summarization-of-Breaking-News-Using-Fine-tuning-BART-Model.pptx
ShamsuddeenMuhammadA
 
Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604
Fermin Galan
 
First Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User EndpointsFirst Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User Endpoints
Globus
 
2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx
Georgi Kodinov
 
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Globus
 
Understanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSageUnderstanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSage
Globus
 
How Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptxHow Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptx
wottaspaceseo
 
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisProviding Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Globus
 
BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024
Ortus Solutions, Corp
 

Recently uploaded (20)

GraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph TechnologyGraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph Technology
 
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
 
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
 
Graphic Design Crash Course for beginners
Graphic Design Crash Course for beginnersGraphic Design Crash Course for beginners
Graphic Design Crash Course for beginners
 
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.ILBeyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
 
Pro Unity Game Development with C-sharp Book
Pro Unity Game Development with C-sharp BookPro Unity Game Development with C-sharp Book
Pro Unity Game Development with C-sharp Book
 
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, BetterWebinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
 
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptxTop Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
 
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoamOpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
 
Text-Summarization-of-Breaking-News-Using-Fine-tuning-BART-Model.pptx
Text-Summarization-of-Breaking-News-Using-Fine-tuning-BART-Model.pptxText-Summarization-of-Breaking-News-Using-Fine-tuning-BART-Model.pptx
Text-Summarization-of-Breaking-News-Using-Fine-tuning-BART-Model.pptx
 
Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604
 
First Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User EndpointsFirst Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User Endpoints
 
2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx
 
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
 
Understanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSageUnderstanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSage
 
How Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptxHow Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptx
 
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisProviding Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
 
BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024
 

The new Netflix API

  • 1. The new Netflix API Why more complexity must lead to more simplicity Katharina Probst DevNexus 2017
  • 2.
  • 3. Js (mostly) java Client lib A Client lib B Client lib C Client lib N Client lib Y Client lib Z ... ... Netflix Micro- services scripts scripts scripts scripts ... scripts scripts scripts scripts Network boundary API Server JVM groovy Network boundary Today’s architecture Network boundary Gateway
  • 4. What is the Netflix
  • 6. Is the API just one gigantic translation layer? Is it a routing layer? If it’s too complex, can we just get rid of it? Raison d’Être.
  • 7. 1. Orchestration 2. Availability protection 3. Abstraction Raison d’Être
  • 10.
  • 14. Search request → response ● Search services provides related search terms ● Search service provides IDs for videos and people ○ IDs depend on various factors, e.g., different catalogs in different countries ● For each ID, we need metadata ○ Titles ○ Images ○ Names ○ Ratings ○ etc. ● ..., which depend on ○ Country ○ A/B tests user is in ○ etc. Response: ❏ Hydrated videos ❏ People names ❏ Query suggestions
  • 15. Orchestration ● Own order of operations ● Provide whatever info clients/services need ○ From other clients/libraries/services ○ From request ● Merge partial results ● Filter results ● Retrieve more info if necessary ● Support mutations (e.g., profile switch) ● Support complex transactions in a limited way
  • 16.
  • 18. Prevent this as much as possible
  • 19. What do customers want? ● No personalized recommendations, or no ability to stream? ● No search, or no ability to continue watching the movie you started last night? ● No cutting-edge A/B experiment experience, or no ability to stream?
  • 20. Top priority: customer experience ● Top priority of top priority: customer can stream videos ● This means API cannot go down entirely ○ If it does, we have an outage ● But some services are not critical to this mission ○ A/B - if we don’t know what A/B tests you’re in, you can still get the default experience ○ Search - if you can’t search, you can still browse
  • 21. Exposure to failures ● As your app grows, your set of dependencies is much more likely to get bigger, not smaller ● Overall uptime = (Dep uptime)^(num deps)
  • 22. ● Fault-tolerance pattern as a library ● Provides operational insights in real-time ● Automatic load-shedding under pressure Hystrix
  • 23. Search client lib Client lib B Ratings client lib Client lib N Cust client lib Client lib Z ... ... scripts scripts scripts scripts ... scripts scripts scripts scripts Network boundary Network boundary Availability protection Search Ratings Customers ... Network boundary Gateway API
  • 24. Search client lib Client lib B Ratings client lib Client lib N Cust client lib Client lib Z ... ... scripts scripts scripts scripts ... scripts scripts scripts scripts Network boundary Network boundary Availability protection Search Ratings Customers ... Network boundary Gateway API
  • 25. Search client lib Client lib B Ratings client lib Client lib N Cust client lib Client lib Z ... ... scripts scripts scripts scripts ... scripts scripts scripts scripts Network boundary Network boundary Availability protection Search Ratings Customers ... Network boundary Gateway API
  • 26. Search client lib Client lib B Ratings client lib Client lib N Cust client lib Client lib Z ... ... scripts scripts scripts scripts ... scripts scripts scripts scripts Network boundary Network boundary If you don’t plan for failure Search Ratings Customers ... Network boundary Gateway API
  • 27. Search client lib Client lib B Ratings client lib Client lib N Cust client lib Client lib Z ... ... scripts scripts scripts scripts ... scripts scripts scripts scripts Network boundary Network boundary If you do plan for failure Search Ratings Customers ... Network boundary Gateway API No search results >> no Netflix
  • 28. Search client lib Client lib B Ratings client lib Client lib N Cust client lib Client lib Z ... ... scripts scripts scripts scripts ... scripts scripts scripts scripts Network boundary Network boundary Fallbacks Search Ratings Customers ... Network boundary Gateway API Return static or stale rating
  • 30. try { return getRatings(id); } catch (Exception ex) { //static value return null; } How to handle errors
  • 31. try { return getRatings(id); } catch (Exception ex) { //TODO What to return here? } How to handle errors
  • 32. Handle errors with fallbacks ● Some options for fallbacks ○ Static value ○ Value from in-memory ○ Value from cache ○ Value from network ○ Throw ○ Code ● Make error-handling explicit ● Applications have to work in the presence of either fallbacks or rethrown exceptions
  • 33.
  • 34. ● Throttling ● Retries ● Timeouts ● Canaries ● Regional rollouts ● Traffic shifting ● Outlier detection (and elimination) ● Advanced load balancing Availability protection beyond Hystrix
  • 36. Abstraction goals ● Shield all device teams from every single mid-tier change … at least for a time. Allows us to move more independently ● Shield all device teams from every single platform/infrastructure change ● Provide APIs not provided by downstream services ○ Find all movies that... ○ Length of movie ● Implementation flexibility, e.g., ○ Caching ○ Batch APIs
  • 37. Abstraction challenges ● Tech debt ● Device teams can have black-box view (“api == cloud”) ● But isn’t the API team the bottleneck? ○ Yes, sometimes. But organizational structure makes this less of a problem than m mid-tier teams dealing with n device teams ● But: separation of concerns
  • 39. Client lib A Client lib B Client lib C Client lib N Client lib Y Client lib Z ... ... Netflix Micro- services scripts scripts scripts scripts ... scripts scripts scripts scripts Network boundary ~2100 active Network boundary Reminder: Today’s architecture Network boundary Gateway API
  • 40. Device teams write server-side logic ● Decoupling teams → better velocity ● UI teams are empowered to ○ Change presentation ○ Filter ○ Add users to A/B tests, which then leads to e.g., different layout.
  • 41. What if we didn’t have an API?
  • 42. Client lib A Client lib B Client lib C Client lib N Client lib Y Client lib Z ... ... Netflix Micro- services scripts scripts scripts scripts ... scripts scripts scripts scripts Network boundary Network boundary What if? Implications for device teams Network boundary Gateway Device teams own client-side applications …
  • 43. Client lib A Client lib B Client lib C Client lib N Client lib Y Client lib Z ... ... Netflix Micro- services scripts scripts scripts scripts ... scripts scripts scripts scripts Network boundary Network boundary What if? Implications for device teams Network boundary Gateway ...and groovy scripts
  • 44. What if? Implications for device teams ● Each device team would have to own ○ Orchestration ○ Frequent dependency updates (currently done (attempted) daily) ○ Implement higher level APIs (all movies that…) ○ Fallbacks and other resiliency protection (e.g., timeouts, retries) ● Recent example ○ Library upgrade caused a lot of NPEs -- why? ○ Worked with team to find out why ○ When fixed, no more NPEs, but instead performance degradation ● Should all teams be dealing with this?
  • 45. Client lib A Client lib B Client lib C Client lib N Client lib Y Client lib Z ... ... Netflix Micro- services scripts scripts scripts scripts ... scripts scripts scripts scripts Network boundary Network boundary What if? Implications for service teams Network boundary Gateway Service teams own services...
  • 46. Client lib A Client lib B Client lib C Client lib N Client lib Y Client lib Z ... ... Netflix Micro- services scripts scripts scripts scripts ... scripts scripts scripts scripts Network boundary Network boundary What if? Implications for service teams Network boundary Gateway ...and client libraries
  • 47. What if? Implications for service teams ● Can only make breaking changes if all device teams who use their service upgrade ● Don’t get resiliency protection (e.g., timeouts, load balancing, retries, fallbacks) unless all device teams who use their service provide it ● Should all teams be dealing with this?
  • 48. What if? Implications for Netflix ● Lower velocity due to tight coupling between many mid-tier teams and many device teams
  • 49. OR: THE DOWNSIDE OF CENTRALIZATION
  • 50. Where are we today? ● Principle: don’t repeat logic ○ It’s better to do it once in API than do it n times for n devices. ● Principle is good, but leads to complexity
  • 52. Complexity challenges ● Frequent (not always canaried) updates to a critical service in production ● Difficulty of debugging (esp. for groovy script writers) ● Slow server startup times ● Lack of operational insights into script resource consumption ● Difficulty of performance profiling ● Lack of feedback loop ● Decoupled code versioning and transitive dependencies
  • 53. Where are we going next?
  • 54. Top priorities ● Move groovy scripts out ● Split up API
  • 55. Client lib A Client lib B Client lib C Client lib N Client lib Y Client lib Z ... ... Netflix Micro- services Network boundary ... Network boundary New architecture: Edge PaaS Network boundary Network boundary Gate- way EAS Network boundary Client lib A Client lib B Client lib C Client lib N Client lib Y Client lib Z ... ... Node app NodeQuark Node app NodeQuark Node app NodeQuark Node app NodeQuark Titus
  • 56. Network boundary Network boundary Netflix Micro- services Network boundary ... New architecture: Edge PaaS Network boundary Gate- way EAS Network boundary Node app NodeQuark Node app NodeQuark Node app NodeQuark Node app NodeQuark Titus Edge Auth Service ● Auth termination ● Centralized place for auth Edge PaaS: ● Platform for node scripts ● Developer tooling for entire SDLC ● Remote API with optimized data access (Falcor) Client lib A Client lib B Client lib C Client lib N Client lib Y Client lib Z ... ... Client lib A Client lib B Client lib C Client lib N Client lib Y Client lib Z ... ...
  • 58. DNAClient A ... Network boundary ... Network boundary Two (or more) APIs Network boundary Network boundary Gate- way EAS Network boundary Node app NodeQuark Node app NodeQuark Node app NodeQuark Node app NodeQuark Titus PB Service A PB Service B PB Service Z ... DNAClient B DNAClient Z Shared Client C Shared Client A ... PB Client B PB Client Z PB Client C PB Service C DNA Service A DNA Service B DNA Service Z ... DNA Service C Shared Service A Shared Service B Shared Service Z ... Split API by function
  • 60. java Netflix Micro- services Network boundary ... Network boundary NodeQuark Platform Network boundary Network boundary Zuul EAS Network boundary Node app NodeQuark Node app NodeQuark Node app NodeQuark Node app NodeQuark Titus Client lib A Client lib B Client lib C Client lib N Client lib Y Client lib Z ... ... Client lib A Client lib B Client lib C Client lib N Client lib Y Client lib Z ... ... Platform for node scripts
  • 61. Edge PaaS: Node Platform ● Node apps run in containers on Titus platform ● Node Platform provides ○ Integration into Netflix ecosystem (e.g., discovery) ○ Logging ○ Dashboards, metrics out of the box with option to customize ○ Support for mocking and testing ● Titus provides ○ Scheduling ○ Autoscaling
  • 63. java Netflix Micro- services Network boundary ... Network boundary New architecture: Edge PaaS Network boundary Network boundary Gate- way EAS Network boundary Node app NodeQuark Node app NodeQuark Node app NodeQuark Node app NodeQuark Titus Client lib A Client lib B Client lib C Client lib N Client lib Y Client lib Z ... ... Client lib A Client lib B Client lib C Client lib N Client lib Y Client lib Z ... ... Developer tooling for entire SDLC
  • 64. Edge PaaS: Developer tooling ● Command line tool for node apps ○ Setup ○ Starting apps ○ Deploying apps ● Local development and debugging of node apps ● UI for lifecycle management, e.g., version management ● One-click rollouts, one-click rollbacks ● Versioning
  • 66. Netflix Micro- services Network boundary ... Network boundary New architecture: Edge PaaS Network boundary Network boundary Zuul EAS Network boundary Node app NodeQuark Node app NodeQuark Node app NodeQuark Node app NodeQuark Titus Remote API with optimized data access Client lib A Client lib B Client lib C Client lib N Client lib Y Client lib Z ... ... Client lib A Client lib B Client lib C Client lib N Client lib Y Client lib Z ... ...
  • 67. Edge PaaS: Remote API ● API still takes care of ○ Orchestration ○ Resiliency protection ○ Abstraction ● Optimized access with Falcor ○ “RESTful composition” with caching ● Binary transport ● Future: channel support
  • 69. Isolated failures: Scripts don’t affect each other (usually) API Temporarily unavailable!
  • 70. Independent root causing API Latency spike after push: 150ms Average latency: 10ms
  • 73. Better regression/performance testing API Tests not affected by other scripts eating up resources on the same JVM
  • 75. Complexity and simplicity ● Product has become much more complex ○ Scripts (more scripts, more complex scripts) ○ Features ○ Number of downstream services to integrate ○ More personalization ○ etc. ● Complexity of API service is high → Need to optimize for simplicity now ○ Process isolation ○ Cleaner developer experience
  • 76. END