SlideShare a Scribd company logo
1 of 76
Download to read offline
The new Netflix API
Why more complexity must lead to more
simplicity
Katharina Probst
DevNexus 2017
Js
(mostly)
java
Client lib A
Client lib B
Client lib C
Client lib N
Client lib Y
Client lib Z
...
...
Netflix
Micro-
services
scripts
scripts
scripts
scripts
...
scripts
scripts
scripts
scripts
Network
boundary API Server JVM
groovy
Network
boundary
Today’s architecture
Network
boundary
Gateway
What is the Netflix
Raison d’Être
Is the API just one gigantic translation layer?
Is it a routing layer?
If it’s too complex, can we just get rid of it?
Raison d’Être.
1. Orchestration
2. Availability protection
3. Abstraction
Raison d’Être
1. Orchestration
Simple example: search
RelatedTerms
People
Titles
Search request → response
● Search services provides related search terms
● Search service provides IDs for videos and people
○ IDs depend on various factors, e.g., different
catalogs in different countries
● For each ID, we need metadata
○ Titles
○ Images
○ Names
○ Ratings
○ etc.
● ..., which depend on
○ Country
○ A/B tests user is in
○ etc.
Response:
❏ Hydrated videos
❏ People names
❏ Query suggestions
Orchestration
● Own order of operations
● Provide whatever info clients/services need
○ From other clients/libraries/services
○ From request
● Merge partial results
● Filter results
● Retrieve more info if necessary
● Support mutations (e.g., profile switch)
● Support complex transactions in a limited way
2. Availability protection
Prevent this as much as possible
What do customers want?
● No personalized recommendations, or no ability to stream?
● No search, or no ability to continue watching the movie you started last night?
● No cutting-edge A/B experiment experience, or no ability to stream?
Top priority: customer experience
● Top priority of top priority: customer can stream videos
● This means API cannot go down entirely
○ If it does, we have an outage
● But some services are not critical to this mission
○ A/B - if we don’t know what A/B tests you’re in, you can still get the default
experience
○ Search - if you can’t search, you can still browse
Exposure to failures
● As your app grows, your set of dependencies is much more likely to get
bigger, not smaller
● Overall uptime = (Dep uptime)^(num deps)
● Fault-tolerance pattern as a library
● Provides operational insights in real-time
● Automatic load-shedding under pressure
Hystrix
Search client lib
Client lib B
Ratings client lib
Client lib N
Cust client lib
Client lib Z
...
...
scripts
scripts
scripts
scripts
...
scripts
scripts
scripts
scripts
Network
boundary
Network
boundary
Availability protection
Search
Ratings
Customers
...
Network
boundary
Gateway
API
Search client lib
Client lib B
Ratings client lib
Client lib N
Cust client lib
Client lib Z
...
...
scripts
scripts
scripts
scripts
...
scripts
scripts
scripts
scripts
Network
boundary
Network
boundary
Availability protection
Search
Ratings
Customers
...
Network
boundary
Gateway
API
Search client lib
Client lib B
Ratings client lib
Client lib N
Cust client lib
Client lib Z
...
...
scripts
scripts
scripts
scripts
...
scripts
scripts
scripts
scripts
Network
boundary
Network
boundary
Availability protection
Search
Ratings
Customers
...
Network
boundary
Gateway
API
Search client lib
Client lib B
Ratings client lib
Client lib N
Cust client lib
Client lib Z
...
...
scripts
scripts
scripts
scripts
...
scripts
scripts
scripts
scripts
Network
boundary
Network
boundary
If you don’t plan for failure
Search
Ratings
Customers
...
Network
boundary
Gateway
API
Search client lib
Client lib B
Ratings client lib
Client lib N
Cust client lib
Client lib Z
...
...
scripts
scripts
scripts
scripts
...
scripts
scripts
scripts
scripts
Network
boundary
Network
boundary
If you do plan for failure
Search
Ratings
Customers
...
Network
boundary
Gateway
API
No search results >>
no Netflix
Search client lib
Client lib B
Ratings client lib
Client lib N
Cust client lib
Client lib Z
...
...
scripts
scripts
scripts
scripts
...
scripts
scripts
scripts
scripts
Network
boundary
Network
boundary
Fallbacks
Search
Ratings
Customers
...
Network
boundary
Gateway
API
Return static or stale
rating
return getRatings(id);
How to handle errors
try {
return getRatings(id);
} catch (Exception ex) {
//static value
return null;
}
How to handle errors
try {
return getRatings(id);
} catch (Exception ex) {
//TODO What to return here?
}
How to handle errors
Handle errors with fallbacks
● Some options for fallbacks
○ Static value
○ Value from in-memory
○ Value from cache
○ Value from network
○ Throw
○ Code
● Make error-handling explicit
● Applications have to work in the presence of either fallbacks or rethrown
exceptions
● Throttling
● Retries
● Timeouts
● Canaries
● Regional rollouts
● Traffic shifting
● Outlier detection (and elimination)
● Advanced load balancing
Availability protection beyond Hystrix
3. Abstraction
Abstraction goals
● Shield all device teams from every single mid-tier change … at least for a time.
Allows us to move more independently
● Shield all device teams from every single platform/infrastructure change
● Provide APIs not provided by downstream services
○ Find all movies that...
○ Length of movie
● Implementation flexibility, e.g.,
○ Caching
○ Batch APIs
Abstraction challenges
● Tech debt
● Device teams can have black-box view (“api == cloud”)
● But isn’t the API team the bottleneck?
○ Yes, sometimes. But organizational structure makes this less of a problem
than m mid-tier teams dealing with n device teams
● But: separation of concerns
Server-side logic
Client lib A
Client lib B
Client lib C
Client lib N
Client lib Y
Client lib Z
...
...
Netflix
Micro-
services
scripts
scripts
scripts
scripts
...
scripts
scripts
scripts
scripts
Network
boundary
~2100 active
Network
boundary
Reminder: Today’s architecture
Network
boundary
Gateway
API
Device teams write server-side logic
● Decoupling teams → better velocity
● UI teams are empowered to
○ Change presentation
○ Filter
○ Add users to A/B tests, which then leads to e.g., different layout.
What if we didn’t have an
API?
Client lib A
Client lib B
Client lib C
Client lib N
Client lib Y
Client lib Z
...
...
Netflix
Micro-
services
scripts
scripts
scripts
scripts
...
scripts
scripts
scripts
scripts
Network
boundary
Network
boundary
What if? Implications for device teams
Network
boundary
Gateway
Device teams own
client-side
applications …
Client lib A
Client lib B
Client lib C
Client lib N
Client lib Y
Client lib Z
...
...
Netflix
Micro-
services
scripts
scripts
scripts
scripts
...
scripts
scripts
scripts
scripts
Network
boundary
Network
boundary
What if? Implications for device teams
Network
boundary
Gateway
...and groovy scripts
What if? Implications for device teams
● Each device team would have to own
○ Orchestration
○ Frequent dependency updates (currently done (attempted) daily)
○ Implement higher level APIs (all movies that…)
○ Fallbacks and other resiliency protection (e.g., timeouts, retries)
● Recent example
○ Library upgrade caused a lot of NPEs -- why?
○ Worked with team to find out why
○ When fixed, no more NPEs, but instead performance degradation
● Should all teams be dealing with this?
Client lib A
Client lib B
Client lib C
Client lib N
Client lib Y
Client lib Z
...
...
Netflix
Micro-
services
scripts
scripts
scripts
scripts
...
scripts
scripts
scripts
scripts
Network
boundary
Network
boundary
What if? Implications for service teams
Network
boundary
Gateway
Service teams own
services...
Client lib A
Client lib B
Client lib C
Client lib N
Client lib Y
Client lib Z
...
...
Netflix
Micro-
services
scripts
scripts
scripts
scripts
...
scripts
scripts
scripts
scripts
Network
boundary
Network
boundary
What if? Implications for service teams
Network
boundary
Gateway
...and client libraries
What if? Implications for service teams
● Can only make breaking changes if all device teams who use their service
upgrade
● Don’t get resiliency protection (e.g., timeouts, load balancing, retries, fallbacks)
unless all device teams who use their service provide it
● Should all teams be dealing with this?
What if? Implications for Netflix
● Lower velocity due to tight coupling between many mid-tier teams and many
device teams
OR:
THE DOWNSIDE OF CENTRALIZATION
Where are we today?
● Principle: don’t repeat logic
○ It’s better to do it once in API than do it n times for n devices.
● Principle is good, but leads to complexity
What complexity
challenges to we have?
Complexity challenges
● Frequent (not always canaried) updates to a critical service in production
● Difficulty of debugging (esp. for groovy script writers)
● Slow server startup times
● Lack of operational insights into script resource consumption
● Difficulty of performance profiling
● Lack of feedback loop
● Decoupled code versioning and transitive dependencies
Where are we going next?
Top priorities
● Move groovy scripts out
● Split up API
Client lib A
Client lib B
Client lib C
Client lib N
Client lib Y
Client lib Z
...
...
Netflix
Micro-
services
Network
boundary
...
Network
boundary
New architecture: Edge PaaS
Network
boundary
Network
boundary
Gate-
way
EAS
Network
boundary Client lib A
Client lib B
Client lib C
Client lib N
Client lib Y
Client lib Z
...
...
Node app NodeQuark
Node app NodeQuark
Node app NodeQuark
Node app NodeQuark
Titus
Network
boundary
Network
boundary
Netflix
Micro-
services
Network
boundary
...
New architecture: Edge PaaS
Network
boundary
Gate-
way
EAS
Network
boundary
Node app NodeQuark
Node app NodeQuark
Node app NodeQuark
Node app NodeQuark
Titus
Edge Auth Service
● Auth
termination
● Centralized
place for
auth
Edge PaaS:
● Platform for node scripts
● Developer tooling for entire SDLC
● Remote API with optimized data access (Falcor)
Client lib A
Client lib B
Client lib C
Client lib N
Client lib Y
Client lib Z
...
...
Client lib A
Client lib B
Client lib C
Client lib N
Client lib Y
Client lib Z
...
...
Two APIs
DNAClient A
...
Network
boundary
...
Network
boundary
Two (or more) APIs
Network
boundary
Network
boundary
Gate-
way
EAS
Network
boundary
Node app NodeQuark
Node app NodeQuark
Node app NodeQuark
Node app NodeQuark
Titus
PB Service A
PB Service B
PB Service Z
...
DNAClient B
DNAClient Z
Shared Client C
Shared Client A
...
PB Client B
PB Client Z
PB Client C
PB Service C
DNA Service A
DNA Service B
DNA Service Z
...
DNA Service C
Shared Service A
Shared Service B
Shared Service Z
...
Split API by
function
NodeQuark Platform
java
Netflix
Micro-
services
Network
boundary
...
Network
boundary
NodeQuark Platform
Network
boundary
Network
boundary
Zuul
EAS
Network
boundary
Node app NodeQuark
Node app NodeQuark
Node app NodeQuark
Node app NodeQuark
Titus
Client lib A
Client lib B
Client lib C
Client lib N
Client lib Y
Client lib Z
...
...
Client lib A
Client lib B
Client lib C
Client lib N
Client lib Y
Client lib Z
...
...
Platform for node scripts
Edge PaaS: Node Platform
● Node apps run in containers on Titus platform
● Node Platform provides
○ Integration into Netflix ecosystem (e.g., discovery)
○ Logging
○ Dashboards, metrics out of the box with option to customize
○ Support for mocking and testing
● Titus provides
○ Scheduling
○ Autoscaling
Developer experience
java
Netflix
Micro-
services
Network
boundary
...
Network
boundary
New architecture: Edge PaaS
Network
boundary
Network
boundary
Gate-
way
EAS
Network
boundary
Node app NodeQuark
Node app NodeQuark
Node app NodeQuark
Node app NodeQuark
Titus
Client lib A
Client lib B
Client lib C
Client lib N
Client lib Y
Client lib Z
...
...
Client lib A
Client lib B
Client lib C
Client lib N
Client lib Y
Client lib Z
...
...
Developer tooling for
entire SDLC
Edge PaaS: Developer tooling
● Command line tool for node apps
○ Setup
○ Starting apps
○ Deploying apps
● Local development and debugging of node apps
● UI for lifecycle management, e.g., version management
● One-click rollouts, one-click rollbacks
● Versioning
Remote API
Netflix
Micro-
services
Network
boundary
...
Network
boundary
New architecture: Edge PaaS
Network
boundary
Network
boundary
Zuul
EAS
Network
boundary
Node app NodeQuark
Node app NodeQuark
Node app NodeQuark
Node app NodeQuark
Titus
Remote API with
optimized data access
Client lib A
Client lib B
Client lib C
Client lib N
Client lib Y
Client lib Z
...
...
Client lib A
Client lib B
Client lib C
Client lib N
Client lib Y
Client lib Z
...
...
Edge PaaS: Remote API
● API still takes care of
○ Orchestration
○ Resiliency protection
○ Abstraction
● Optimized access with Falcor
○ “RESTful composition” with caching
● Binary transport
● Future: channel support
Greater simplicity
Isolated failures:
Scripts don’t affect each other (usually)
API
Temporarily
unavailable!
Independent root causing
API
Latency
spike after
push:
150ms
Average
latency:
10ms
Independent autoscaling
API
Independent insights
API
Average
latency:
50ms
Average
latency:
10ms
Better regression/performance testing
API
Tests not
affected by
other scripts
eating up
resources
on the same
JVM
Conclusion
Complexity and simplicity
● Product has become much more complex
○ Scripts (more scripts, more complex scripts)
○ Features
○ Number of downstream services to integrate
○ More personalization
○ etc.
● Complexity of API service is high → Need to optimize for simplicity
now
○ Process isolation
○ Cleaner developer experience
END

More Related Content

What's hot

nginx 입문 공부자료
nginx 입문 공부자료nginx 입문 공부자료
nginx 입문 공부자료choi sungwook
 
Event-driven serverless functions with Next.js and Inngest
Event-driven serverless functions with Next.js and InngestEvent-driven serverless functions with Next.js and Inngest
Event-driven serverless functions with Next.js and InngestDan Farrelly
 
Monitoring and observability
Monitoring and observabilityMonitoring and observability
Monitoring and observabilityTheo Schlossnagle
 
AWS Summit Singapore - Managing a Database Migration Project | Best Practices
AWS Summit Singapore - Managing a Database Migration Project | Best PracticesAWS Summit Singapore - Managing a Database Migration Project | Best Practices
AWS Summit Singapore - Managing a Database Migration Project | Best PracticesAmazon Web Services
 
Scalability, Availability & Stability Patterns
Scalability, Availability & Stability PatternsScalability, Availability & Stability Patterns
Scalability, Availability & Stability PatternsJonas Bonér
 
devon2013: 사내Git저장소개발사례
devon2013: 사내Git저장소개발사례devon2013: 사내Git저장소개발사례
devon2013: 사내Git저장소개발사례Daehyun Kim
 
〈야생의 땅: 듀랑고〉 서버 아키텍처 Vol. 3
〈야생의 땅: 듀랑고〉 서버 아키텍처 Vol. 3〈야생의 땅: 듀랑고〉 서버 아키텍처 Vol. 3
〈야생의 땅: 듀랑고〉 서버 아키텍처 Vol. 3Heungsub Lee
 
Microservices Design Patterns
Microservices Design PatternsMicroservices Design Patterns
Microservices Design PatternsHaim Michael
 
Observability, what, why and how
Observability, what, why and howObservability, what, why and how
Observability, what, why and howNeeraj Bagga
 
A story of Netflix and AB Testing in the User Interface using DynamoDB - DAT3...
A story of Netflix and AB Testing in the User Interface using DynamoDB - DAT3...A story of Netflix and AB Testing in the User Interface using DynamoDB - DAT3...
A story of Netflix and AB Testing in the User Interface using DynamoDB - DAT3...Amazon Web Services
 
AWS로 사용자 천만 명 서비스 만들기 (윤석찬)- 클라우드 태권 2015
AWS로 사용자 천만 명 서비스 만들기 (윤석찬)- 클라우드 태권 2015 AWS로 사용자 천만 명 서비스 만들기 (윤석찬)- 클라우드 태권 2015
AWS로 사용자 천만 명 서비스 만들기 (윤석찬)- 클라우드 태권 2015 Amazon Web Services Korea
 
효율적인 빅데이터 분석 및 처리를 위한 Glue, EMR 활용 - 김태현 솔루션즈 아키텍트, AWS :: AWS Summit Seoul 2019
효율적인 빅데이터 분석 및 처리를 위한 Glue, EMR 활용 - 김태현 솔루션즈 아키텍트, AWS :: AWS Summit Seoul 2019효율적인 빅데이터 분석 및 처리를 위한 Glue, EMR 활용 - 김태현 솔루션즈 아키텍트, AWS :: AWS Summit Seoul 2019
효율적인 빅데이터 분석 및 처리를 위한 Glue, EMR 활용 - 김태현 솔루션즈 아키텍트, AWS :: AWS Summit Seoul 2019Amazon Web Services Korea
 
An Introduction to Chaos Engineering
An Introduction to Chaos EngineeringAn Introduction to Chaos Engineering
An Introduction to Chaos EngineeringGremlin
 
Amazon & AWS의 MSA와 DevOps, 그리고 지속적 혁신
Amazon & AWS의 MSA와 DevOps, 그리고 지속적 혁신Amazon & AWS의 MSA와 DevOps, 그리고 지속적 혁신
Amazon & AWS의 MSA와 DevOps, 그리고 지속적 혁신AgileKoreaConference Alliance
 
DevOpsSec: Appling DevOps Principles to Security, DevOpsDays Austin 2012
DevOpsSec: Appling DevOps Principles to Security, DevOpsDays Austin 2012DevOpsSec: Appling DevOps Principles to Security, DevOpsDays Austin 2012
DevOpsSec: Appling DevOps Principles to Security, DevOpsDays Austin 2012Nick Galbreath
 
실시간 스트리밍 분석 Kinesis Data Analytics Deep Dive
실시간 스트리밍 분석  Kinesis Data Analytics Deep Dive실시간 스트리밍 분석  Kinesis Data Analytics Deep Dive
실시간 스트리밍 분석 Kinesis Data Analytics Deep DiveAmazon Web Services Korea
 

What's hot (20)

nginx 입문 공부자료
nginx 입문 공부자료nginx 입문 공부자료
nginx 입문 공부자료
 
Event-driven serverless functions with Next.js and Inngest
Event-driven serverless functions with Next.js and InngestEvent-driven serverless functions with Next.js and Inngest
Event-driven serverless functions with Next.js and Inngest
 
Monitoring and observability
Monitoring and observabilityMonitoring and observability
Monitoring and observability
 
AWS Summit Singapore - Managing a Database Migration Project | Best Practices
AWS Summit Singapore - Managing a Database Migration Project | Best PracticesAWS Summit Singapore - Managing a Database Migration Project | Best Practices
AWS Summit Singapore - Managing a Database Migration Project | Best Practices
 
Scalability, Availability & Stability Patterns
Scalability, Availability & Stability PatternsScalability, Availability & Stability Patterns
Scalability, Availability & Stability Patterns
 
devon2013: 사내Git저장소개발사례
devon2013: 사내Git저장소개발사례devon2013: 사내Git저장소개발사례
devon2013: 사내Git저장소개발사례
 
〈야생의 땅: 듀랑고〉 서버 아키텍처 Vol. 3
〈야생의 땅: 듀랑고〉 서버 아키텍처 Vol. 3〈야생의 땅: 듀랑고〉 서버 아키텍처 Vol. 3
〈야생의 땅: 듀랑고〉 서버 아키텍처 Vol. 3
 
Microservices Design Patterns
Microservices Design PatternsMicroservices Design Patterns
Microservices Design Patterns
 
Amazon ElastiCache and Redis
Amazon ElastiCache and RedisAmazon ElastiCache and Redis
Amazon ElastiCache and Redis
 
Observability, what, why and how
Observability, what, why and howObservability, what, why and how
Observability, what, why and how
 
A story of Netflix and AB Testing in the User Interface using DynamoDB - DAT3...
A story of Netflix and AB Testing in the User Interface using DynamoDB - DAT3...A story of Netflix and AB Testing in the User Interface using DynamoDB - DAT3...
A story of Netflix and AB Testing in the User Interface using DynamoDB - DAT3...
 
Cloud arch patterns
Cloud arch patternsCloud arch patterns
Cloud arch patterns
 
Deep Dive on Amazon Aurora
Deep Dive on Amazon AuroraDeep Dive on Amazon Aurora
Deep Dive on Amazon Aurora
 
AWS로 사용자 천만 명 서비스 만들기 (윤석찬)- 클라우드 태권 2015
AWS로 사용자 천만 명 서비스 만들기 (윤석찬)- 클라우드 태권 2015 AWS로 사용자 천만 명 서비스 만들기 (윤석찬)- 클라우드 태권 2015
AWS로 사용자 천만 명 서비스 만들기 (윤석찬)- 클라우드 태권 2015
 
효율적인 빅데이터 분석 및 처리를 위한 Glue, EMR 활용 - 김태현 솔루션즈 아키텍트, AWS :: AWS Summit Seoul 2019
효율적인 빅데이터 분석 및 처리를 위한 Glue, EMR 활용 - 김태현 솔루션즈 아키텍트, AWS :: AWS Summit Seoul 2019효율적인 빅데이터 분석 및 처리를 위한 Glue, EMR 활용 - 김태현 솔루션즈 아키텍트, AWS :: AWS Summit Seoul 2019
효율적인 빅데이터 분석 및 처리를 위한 Glue, EMR 활용 - 김태현 솔루션즈 아키텍트, AWS :: AWS Summit Seoul 2019
 
Microservice architecture
Microservice architectureMicroservice architecture
Microservice architecture
 
An Introduction to Chaos Engineering
An Introduction to Chaos EngineeringAn Introduction to Chaos Engineering
An Introduction to Chaos Engineering
 
Amazon & AWS의 MSA와 DevOps, 그리고 지속적 혁신
Amazon & AWS의 MSA와 DevOps, 그리고 지속적 혁신Amazon & AWS의 MSA와 DevOps, 그리고 지속적 혁신
Amazon & AWS의 MSA와 DevOps, 그리고 지속적 혁신
 
DevOpsSec: Appling DevOps Principles to Security, DevOpsDays Austin 2012
DevOpsSec: Appling DevOps Principles to Security, DevOpsDays Austin 2012DevOpsSec: Appling DevOps Principles to Security, DevOpsDays Austin 2012
DevOpsSec: Appling DevOps Principles to Security, DevOpsDays Austin 2012
 
실시간 스트리밍 분석 Kinesis Data Analytics Deep Dive
실시간 스트리밍 분석  Kinesis Data Analytics Deep Dive실시간 스트리밍 분석  Kinesis Data Analytics Deep Dive
실시간 스트리밍 분석 Kinesis Data Analytics Deep Dive
 

Viewers also liked

Engineering Manager, Edge Insights @Netflix
Engineering Manager, Edge Insights @NetflixEngineering Manager, Edge Insights @Netflix
Engineering Manager, Edge Insights @NetflixSangeeta Narayanan
 
Move Fast;Stay Safe:Developing & Deploying the Netflix API
Move Fast;Stay Safe:Developing & Deploying the Netflix APIMove Fast;Stay Safe:Developing & Deploying the Netflix API
Move Fast;Stay Safe:Developing & Deploying the Netflix APISangeeta Narayanan
 
Making Microservices work at Netflix
Making Microservices  work at NetflixMaking Microservices  work at Netflix
Making Microservices work at NetflixSangeeta Narayanan
 
QConSF 2014 - How we learned to stop worrying and start deploying the Netflix...
QConSF 2014 - How we learned to stop worrying and start deploying the Netflix...QConSF 2014 - How we learned to stop worrying and start deploying the Netflix...
QConSF 2014 - How we learned to stop worrying and start deploying the Netflix...Sangeeta Narayanan
 

Viewers also liked (6)

Evolving the Netflix API
Evolving the Netflix APIEvolving the Netflix API
Evolving the Netflix API
 
Microservices at Netflix
Microservices at NetflixMicroservices at Netflix
Microservices at Netflix
 
Engineering Manager, Edge Insights @Netflix
Engineering Manager, Edge Insights @NetflixEngineering Manager, Edge Insights @Netflix
Engineering Manager, Edge Insights @Netflix
 
Move Fast;Stay Safe:Developing & Deploying the Netflix API
Move Fast;Stay Safe:Developing & Deploying the Netflix APIMove Fast;Stay Safe:Developing & Deploying the Netflix API
Move Fast;Stay Safe:Developing & Deploying the Netflix API
 
Making Microservices work at Netflix
Making Microservices  work at NetflixMaking Microservices  work at Netflix
Making Microservices work at Netflix
 
QConSF 2014 - How we learned to stop worrying and start deploying the Netflix...
QConSF 2014 - How we learned to stop worrying and start deploying the Netflix...QConSF 2014 - How we learned to stop worrying and start deploying the Netflix...
QConSF 2014 - How we learned to stop worrying and start deploying the Netflix...
 

Similar to The new Netflix API

AWS Lambda and Serverless framework: lessons learned while building a serverl...
AWS Lambda and Serverless framework: lessons learned while building a serverl...AWS Lambda and Serverless framework: lessons learned while building a serverl...
AWS Lambda and Serverless framework: lessons learned while building a serverl...Luciano Mammino
 
Netflix Architecture and Open Source
Netflix Architecture and Open SourceNetflix Architecture and Open Source
Netflix Architecture and Open SourceAll Things Open
 
The Netflix API Platform for Server-Side Scripting
The Netflix API Platform for Server-Side ScriptingThe Netflix API Platform for Server-Side Scripting
The Netflix API Platform for Server-Side ScriptingKatharina Probst
 
PyConIE 2017 Writing and deploying serverless python applications
PyConIE 2017 Writing and deploying serverless python applicationsPyConIE 2017 Writing and deploying serverless python applications
PyConIE 2017 Writing and deploying serverless python applicationsCesar Cardenas Desales
 
NetflixOSS Meetup S6E1 - Titus & Containers
NetflixOSS Meetup S6E1 - Titus & ContainersNetflixOSS Meetup S6E1 - Titus & Containers
NetflixOSS Meetup S6E1 - Titus & Containersaspyker
 
PyConIT 2018 Writing and deploying serverless python applications
PyConIT 2018 Writing and deploying serverless python applicationsPyConIT 2018 Writing and deploying serverless python applications
PyConIT 2018 Writing and deploying serverless python applicationsCesar Cardenas Desales
 
Geoscience and Microservices
Geoscience and Microservices Geoscience and Microservices
Geoscience and Microservices Matthew Gerring
 
Triangle Devops Meetup 10/2015
Triangle Devops Meetup 10/2015Triangle Devops Meetup 10/2015
Triangle Devops Meetup 10/2015aspyker
 
Polyglot persistence @ netflix (CDE Meetup)
Polyglot persistence @ netflix (CDE Meetup) Polyglot persistence @ netflix (CDE Meetup)
Polyglot persistence @ netflix (CDE Meetup) Roopa Tangirala
 
Easy Microservices with JHipster - Devoxx BE 2017
Easy Microservices with JHipster - Devoxx BE 2017Easy Microservices with JHipster - Devoxx BE 2017
Easy Microservices with JHipster - Devoxx BE 2017Deepu K Sasidharan
 
Devoxx Belgium 2017 - easy microservices with JHipster
Devoxx Belgium 2017 - easy microservices with JHipsterDevoxx Belgium 2017 - easy microservices with JHipster
Devoxx Belgium 2017 - easy microservices with JHipsterJulien Dubois
 
Session 01 - Introduction to Java
Session 01 - Introduction to JavaSession 01 - Introduction to Java
Session 01 - Introduction to JavaPawanMM
 
The Netflix API for a global service
The Netflix API for a global serviceThe Netflix API for a global service
The Netflix API for a global serviceKatharina Probst
 
OS for AI: Elastic Microservices & the Next Gen of ML
OS for AI: Elastic Microservices & the Next Gen of MLOS for AI: Elastic Microservices & the Next Gen of ML
OS for AI: Elastic Microservices & the Next Gen of MLNordic APIs
 
What is a Service Mesh and what can it do for your Microservices
What is a Service Mesh and what can it do for your MicroservicesWhat is a Service Mesh and what can it do for your Microservices
What is a Service Mesh and what can it do for your MicroservicesMatt Turner
 
Writing and deploying serverless python applications
Writing and deploying serverless python applicationsWriting and deploying serverless python applications
Writing and deploying serverless python applicationsCesar Cardenas Desales
 
2017 Microservices Practitioner Virtual Summit: Microservices at Squarespace ...
2017 Microservices Practitioner Virtual Summit: Microservices at Squarespace ...2017 Microservices Practitioner Virtual Summit: Microservices at Squarespace ...
2017 Microservices Practitioner Virtual Summit: Microservices at Squarespace ...Ambassador Labs
 
Kubernetes @ Squarespace: Kubernetes in the Datacenter
Kubernetes @ Squarespace: Kubernetes in the DatacenterKubernetes @ Squarespace: Kubernetes in the Datacenter
Kubernetes @ Squarespace: Kubernetes in the DatacenterKevin Lynch
 
Self service cloud resources
Self service cloud resourcesSelf service cloud resources
Self service cloud resourcesAppvia
 

Similar to The new Netflix API (20)

AWS Lambda and Serverless framework: lessons learned while building a serverl...
AWS Lambda and Serverless framework: lessons learned while building a serverl...AWS Lambda and Serverless framework: lessons learned while building a serverl...
AWS Lambda and Serverless framework: lessons learned while building a serverl...
 
Netflix Architecture and Open Source
Netflix Architecture and Open SourceNetflix Architecture and Open Source
Netflix Architecture and Open Source
 
The Netflix API Platform for Server-Side Scripting
The Netflix API Platform for Server-Side ScriptingThe Netflix API Platform for Server-Side Scripting
The Netflix API Platform for Server-Side Scripting
 
PyConIE 2017 Writing and deploying serverless python applications
PyConIE 2017 Writing and deploying serverless python applicationsPyConIE 2017 Writing and deploying serverless python applications
PyConIE 2017 Writing and deploying serverless python applications
 
NetflixOSS Meetup S6E1 - Titus & Containers
NetflixOSS Meetup S6E1 - Titus & ContainersNetflixOSS Meetup S6E1 - Titus & Containers
NetflixOSS Meetup S6E1 - Titus & Containers
 
PyConIT 2018 Writing and deploying serverless python applications
PyConIT 2018 Writing and deploying serverless python applicationsPyConIT 2018 Writing and deploying serverless python applications
PyConIT 2018 Writing and deploying serverless python applications
 
Geoscience and Microservices
Geoscience and Microservices Geoscience and Microservices
Geoscience and Microservices
 
Triangle Devops Meetup 10/2015
Triangle Devops Meetup 10/2015Triangle Devops Meetup 10/2015
Triangle Devops Meetup 10/2015
 
Polyglot persistence @ netflix (CDE Meetup)
Polyglot persistence @ netflix (CDE Meetup) Polyglot persistence @ netflix (CDE Meetup)
Polyglot persistence @ netflix (CDE Meetup)
 
Easy Microservices with JHipster - Devoxx BE 2017
Easy Microservices with JHipster - Devoxx BE 2017Easy Microservices with JHipster - Devoxx BE 2017
Easy Microservices with JHipster - Devoxx BE 2017
 
Devoxx Belgium 2017 - easy microservices with JHipster
Devoxx Belgium 2017 - easy microservices with JHipsterDevoxx Belgium 2017 - easy microservices with JHipster
Devoxx Belgium 2017 - easy microservices with JHipster
 
Session 01 - Introduction to Java
Session 01 - Introduction to JavaSession 01 - Introduction to Java
Session 01 - Introduction to Java
 
The Netflix API for a global service
The Netflix API for a global serviceThe Netflix API for a global service
The Netflix API for a global service
 
OS for AI: Elastic Microservices & the Next Gen of ML
OS for AI: Elastic Microservices & the Next Gen of MLOS for AI: Elastic Microservices & the Next Gen of ML
OS for AI: Elastic Microservices & the Next Gen of ML
 
What is a Service Mesh and what can it do for your Microservices
What is a Service Mesh and what can it do for your MicroservicesWhat is a Service Mesh and what can it do for your Microservices
What is a Service Mesh and what can it do for your Microservices
 
Writing and deploying serverless python applications
Writing and deploying serverless python applicationsWriting and deploying serverless python applications
Writing and deploying serverless python applications
 
Beginners Node.js
Beginners Node.jsBeginners Node.js
Beginners Node.js
 
2017 Microservices Practitioner Virtual Summit: Microservices at Squarespace ...
2017 Microservices Practitioner Virtual Summit: Microservices at Squarespace ...2017 Microservices Practitioner Virtual Summit: Microservices at Squarespace ...
2017 Microservices Practitioner Virtual Summit: Microservices at Squarespace ...
 
Kubernetes @ Squarespace: Kubernetes in the Datacenter
Kubernetes @ Squarespace: Kubernetes in the DatacenterKubernetes @ Squarespace: Kubernetes in the Datacenter
Kubernetes @ Squarespace: Kubernetes in the Datacenter
 
Self service cloud resources
Self service cloud resourcesSelf service cloud resources
Self service cloud resources
 

Recently uploaded

5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812
 
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfkalichargn70th171
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...OnePlan Solutions
 
Test Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and BackendTest Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and BackendArshad QA
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AIABDERRAOUF MEHENNI
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...OnePlan Solutions
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number SystemsJheuzeDellosa
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...soniya singh
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about usDynamic Netsoft
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software DevelopersVinodh Ram
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...stazi3110
 

Recently uploaded (20)

5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
 
Test Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and BackendTest Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and Backend
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number Systems
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about us
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software Developers
 
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
 

The new Netflix API

  • 1. The new Netflix API Why more complexity must lead to more simplicity Katharina Probst DevNexus 2017
  • 2.
  • 3. Js (mostly) java Client lib A Client lib B Client lib C Client lib N Client lib Y Client lib Z ... ... Netflix Micro- services scripts scripts scripts scripts ... scripts scripts scripts scripts Network boundary API Server JVM groovy Network boundary Today’s architecture Network boundary Gateway
  • 4. What is the Netflix
  • 6. Is the API just one gigantic translation layer? Is it a routing layer? If it’s too complex, can we just get rid of it? Raison d’Être.
  • 7. 1. Orchestration 2. Availability protection 3. Abstraction Raison d’Être
  • 10.
  • 14. Search request → response ● Search services provides related search terms ● Search service provides IDs for videos and people ○ IDs depend on various factors, e.g., different catalogs in different countries ● For each ID, we need metadata ○ Titles ○ Images ○ Names ○ Ratings ○ etc. ● ..., which depend on ○ Country ○ A/B tests user is in ○ etc. Response: ❏ Hydrated videos ❏ People names ❏ Query suggestions
  • 15. Orchestration ● Own order of operations ● Provide whatever info clients/services need ○ From other clients/libraries/services ○ From request ● Merge partial results ● Filter results ● Retrieve more info if necessary ● Support mutations (e.g., profile switch) ● Support complex transactions in a limited way
  • 16.
  • 18. Prevent this as much as possible
  • 19. What do customers want? ● No personalized recommendations, or no ability to stream? ● No search, or no ability to continue watching the movie you started last night? ● No cutting-edge A/B experiment experience, or no ability to stream?
  • 20. Top priority: customer experience ● Top priority of top priority: customer can stream videos ● This means API cannot go down entirely ○ If it does, we have an outage ● But some services are not critical to this mission ○ A/B - if we don’t know what A/B tests you’re in, you can still get the default experience ○ Search - if you can’t search, you can still browse
  • 21. Exposure to failures ● As your app grows, your set of dependencies is much more likely to get bigger, not smaller ● Overall uptime = (Dep uptime)^(num deps)
  • 22. ● Fault-tolerance pattern as a library ● Provides operational insights in real-time ● Automatic load-shedding under pressure Hystrix
  • 23. Search client lib Client lib B Ratings client lib Client lib N Cust client lib Client lib Z ... ... scripts scripts scripts scripts ... scripts scripts scripts scripts Network boundary Network boundary Availability protection Search Ratings Customers ... Network boundary Gateway API
  • 24. Search client lib Client lib B Ratings client lib Client lib N Cust client lib Client lib Z ... ... scripts scripts scripts scripts ... scripts scripts scripts scripts Network boundary Network boundary Availability protection Search Ratings Customers ... Network boundary Gateway API
  • 25. Search client lib Client lib B Ratings client lib Client lib N Cust client lib Client lib Z ... ... scripts scripts scripts scripts ... scripts scripts scripts scripts Network boundary Network boundary Availability protection Search Ratings Customers ... Network boundary Gateway API
  • 26. Search client lib Client lib B Ratings client lib Client lib N Cust client lib Client lib Z ... ... scripts scripts scripts scripts ... scripts scripts scripts scripts Network boundary Network boundary If you don’t plan for failure Search Ratings Customers ... Network boundary Gateway API
  • 27. Search client lib Client lib B Ratings client lib Client lib N Cust client lib Client lib Z ... ... scripts scripts scripts scripts ... scripts scripts scripts scripts Network boundary Network boundary If you do plan for failure Search Ratings Customers ... Network boundary Gateway API No search results >> no Netflix
  • 28. Search client lib Client lib B Ratings client lib Client lib N Cust client lib Client lib Z ... ... scripts scripts scripts scripts ... scripts scripts scripts scripts Network boundary Network boundary Fallbacks Search Ratings Customers ... Network boundary Gateway API Return static or stale rating
  • 30. try { return getRatings(id); } catch (Exception ex) { //static value return null; } How to handle errors
  • 31. try { return getRatings(id); } catch (Exception ex) { //TODO What to return here? } How to handle errors
  • 32. Handle errors with fallbacks ● Some options for fallbacks ○ Static value ○ Value from in-memory ○ Value from cache ○ Value from network ○ Throw ○ Code ● Make error-handling explicit ● Applications have to work in the presence of either fallbacks or rethrown exceptions
  • 33.
  • 34. ● Throttling ● Retries ● Timeouts ● Canaries ● Regional rollouts ● Traffic shifting ● Outlier detection (and elimination) ● Advanced load balancing Availability protection beyond Hystrix
  • 36. Abstraction goals ● Shield all device teams from every single mid-tier change … at least for a time. Allows us to move more independently ● Shield all device teams from every single platform/infrastructure change ● Provide APIs not provided by downstream services ○ Find all movies that... ○ Length of movie ● Implementation flexibility, e.g., ○ Caching ○ Batch APIs
  • 37. Abstraction challenges ● Tech debt ● Device teams can have black-box view (“api == cloud”) ● But isn’t the API team the bottleneck? ○ Yes, sometimes. But organizational structure makes this less of a problem than m mid-tier teams dealing with n device teams ● But: separation of concerns
  • 39. Client lib A Client lib B Client lib C Client lib N Client lib Y Client lib Z ... ... Netflix Micro- services scripts scripts scripts scripts ... scripts scripts scripts scripts Network boundary ~2100 active Network boundary Reminder: Today’s architecture Network boundary Gateway API
  • 40. Device teams write server-side logic ● Decoupling teams → better velocity ● UI teams are empowered to ○ Change presentation ○ Filter ○ Add users to A/B tests, which then leads to e.g., different layout.
  • 41. What if we didn’t have an API?
  • 42. Client lib A Client lib B Client lib C Client lib N Client lib Y Client lib Z ... ... Netflix Micro- services scripts scripts scripts scripts ... scripts scripts scripts scripts Network boundary Network boundary What if? Implications for device teams Network boundary Gateway Device teams own client-side applications …
  • 43. Client lib A Client lib B Client lib C Client lib N Client lib Y Client lib Z ... ... Netflix Micro- services scripts scripts scripts scripts ... scripts scripts scripts scripts Network boundary Network boundary What if? Implications for device teams Network boundary Gateway ...and groovy scripts
  • 44. What if? Implications for device teams ● Each device team would have to own ○ Orchestration ○ Frequent dependency updates (currently done (attempted) daily) ○ Implement higher level APIs (all movies that…) ○ Fallbacks and other resiliency protection (e.g., timeouts, retries) ● Recent example ○ Library upgrade caused a lot of NPEs -- why? ○ Worked with team to find out why ○ When fixed, no more NPEs, but instead performance degradation ● Should all teams be dealing with this?
  • 45. Client lib A Client lib B Client lib C Client lib N Client lib Y Client lib Z ... ... Netflix Micro- services scripts scripts scripts scripts ... scripts scripts scripts scripts Network boundary Network boundary What if? Implications for service teams Network boundary Gateway Service teams own services...
  • 46. Client lib A Client lib B Client lib C Client lib N Client lib Y Client lib Z ... ... Netflix Micro- services scripts scripts scripts scripts ... scripts scripts scripts scripts Network boundary Network boundary What if? Implications for service teams Network boundary Gateway ...and client libraries
  • 47. What if? Implications for service teams ● Can only make breaking changes if all device teams who use their service upgrade ● Don’t get resiliency protection (e.g., timeouts, load balancing, retries, fallbacks) unless all device teams who use their service provide it ● Should all teams be dealing with this?
  • 48. What if? Implications for Netflix ● Lower velocity due to tight coupling between many mid-tier teams and many device teams
  • 49. OR: THE DOWNSIDE OF CENTRALIZATION
  • 50. Where are we today? ● Principle: don’t repeat logic ○ It’s better to do it once in API than do it n times for n devices. ● Principle is good, but leads to complexity
  • 52. Complexity challenges ● Frequent (not always canaried) updates to a critical service in production ● Difficulty of debugging (esp. for groovy script writers) ● Slow server startup times ● Lack of operational insights into script resource consumption ● Difficulty of performance profiling ● Lack of feedback loop ● Decoupled code versioning and transitive dependencies
  • 53. Where are we going next?
  • 54. Top priorities ● Move groovy scripts out ● Split up API
  • 55. Client lib A Client lib B Client lib C Client lib N Client lib Y Client lib Z ... ... Netflix Micro- services Network boundary ... Network boundary New architecture: Edge PaaS Network boundary Network boundary Gate- way EAS Network boundary Client lib A Client lib B Client lib C Client lib N Client lib Y Client lib Z ... ... Node app NodeQuark Node app NodeQuark Node app NodeQuark Node app NodeQuark Titus
  • 56. Network boundary Network boundary Netflix Micro- services Network boundary ... New architecture: Edge PaaS Network boundary Gate- way EAS Network boundary Node app NodeQuark Node app NodeQuark Node app NodeQuark Node app NodeQuark Titus Edge Auth Service ● Auth termination ● Centralized place for auth Edge PaaS: ● Platform for node scripts ● Developer tooling for entire SDLC ● Remote API with optimized data access (Falcor) Client lib A Client lib B Client lib C Client lib N Client lib Y Client lib Z ... ... Client lib A Client lib B Client lib C Client lib N Client lib Y Client lib Z ... ...
  • 58. DNAClient A ... Network boundary ... Network boundary Two (or more) APIs Network boundary Network boundary Gate- way EAS Network boundary Node app NodeQuark Node app NodeQuark Node app NodeQuark Node app NodeQuark Titus PB Service A PB Service B PB Service Z ... DNAClient B DNAClient Z Shared Client C Shared Client A ... PB Client B PB Client Z PB Client C PB Service C DNA Service A DNA Service B DNA Service Z ... DNA Service C Shared Service A Shared Service B Shared Service Z ... Split API by function
  • 60. java Netflix Micro- services Network boundary ... Network boundary NodeQuark Platform Network boundary Network boundary Zuul EAS Network boundary Node app NodeQuark Node app NodeQuark Node app NodeQuark Node app NodeQuark Titus Client lib A Client lib B Client lib C Client lib N Client lib Y Client lib Z ... ... Client lib A Client lib B Client lib C Client lib N Client lib Y Client lib Z ... ... Platform for node scripts
  • 61. Edge PaaS: Node Platform ● Node apps run in containers on Titus platform ● Node Platform provides ○ Integration into Netflix ecosystem (e.g., discovery) ○ Logging ○ Dashboards, metrics out of the box with option to customize ○ Support for mocking and testing ● Titus provides ○ Scheduling ○ Autoscaling
  • 63. java Netflix Micro- services Network boundary ... Network boundary New architecture: Edge PaaS Network boundary Network boundary Gate- way EAS Network boundary Node app NodeQuark Node app NodeQuark Node app NodeQuark Node app NodeQuark Titus Client lib A Client lib B Client lib C Client lib N Client lib Y Client lib Z ... ... Client lib A Client lib B Client lib C Client lib N Client lib Y Client lib Z ... ... Developer tooling for entire SDLC
  • 64. Edge PaaS: Developer tooling ● Command line tool for node apps ○ Setup ○ Starting apps ○ Deploying apps ● Local development and debugging of node apps ● UI for lifecycle management, e.g., version management ● One-click rollouts, one-click rollbacks ● Versioning
  • 66. Netflix Micro- services Network boundary ... Network boundary New architecture: Edge PaaS Network boundary Network boundary Zuul EAS Network boundary Node app NodeQuark Node app NodeQuark Node app NodeQuark Node app NodeQuark Titus Remote API with optimized data access Client lib A Client lib B Client lib C Client lib N Client lib Y Client lib Z ... ... Client lib A Client lib B Client lib C Client lib N Client lib Y Client lib Z ... ...
  • 67. Edge PaaS: Remote API ● API still takes care of ○ Orchestration ○ Resiliency protection ○ Abstraction ● Optimized access with Falcor ○ “RESTful composition” with caching ● Binary transport ● Future: channel support
  • 69. Isolated failures: Scripts don’t affect each other (usually) API Temporarily unavailable!
  • 70. Independent root causing API Latency spike after push: 150ms Average latency: 10ms
  • 73. Better regression/performance testing API Tests not affected by other scripts eating up resources on the same JVM
  • 75. Complexity and simplicity ● Product has become much more complex ○ Scripts (more scripts, more complex scripts) ○ Features ○ Number of downstream services to integrate ○ More personalization ○ etc. ● Complexity of API service is high → Need to optimize for simplicity now ○ Process isolation ○ Cleaner developer experience
  • 76. END