MicroServices at Netflix - challenges of scale

MicroServices at NETFLIX
Best Practices & Tools of the trade
Sudhir Tonse
Manager, Cloud Platform
@stonse
http://linkedin.com/in/sudhirtonse
Nitesh Kant
Platform Architect
@NiteshKant
http://linkedin.com/in/niteshkant
● Old DataCenter
(2008)
● Everything in
one WebApp
(.war)
● AWS Cloud
(2012)
● 100s of Fine
Grained
Services
Positives
● Isolation brings better Availability*
● Independent Speed of Delivery (by different
teams)
● Decentralized Governance (DevOps)
Challenges
Distributed Systems are inherently Complex
Operational Overhead (100s of services; DevOps model absolutely required)
Service Interface Versioning, Mismatches?
Testing (Need the entire ecosystem to test)
Fan out of Requests -> Increases n/w traffic
Claim
MicroServices increase your overall availability
True?
Yes … but wait!
One missing “;” brought down ALL
of Netflix
Introduced MicroServices ...
Uptime SLA
Assume a Monolithic Service with 99.99%
availability
What if you have ...
~30 Microservices (each with 99.99% SLA)??
Reality
One rogue (dependency) micro service CAN
bring your whole site down!
How?
Service Hosed!!
Combined Effective SLA (Availability)
== 2 HOURS of downtime per month
== 99.7 % uptime!!
But what if I want better?
MicroServices does not automatically mean
better Availability
- Unless you have Fault Tolerant
Architecture
Guard your Service!
Use Hystrix (http://github.com/netflix/hystrix)
Service Discovery & Loadbalancers
Choice
1. Central Loadbalancer? (H/W or S/W)
OR
2. Client based S/W Loadbalancer?
Client based Smart Loadbalancer
Use Ribbon (http://github.com/netflix/ribbon)
Tools of the Trade
OR
Service Dependency View
Distributed Tracing
Chattiness (and Fan Out)
~2 Billion Requests per day on Edge Service
…
Results in ~20 Billion Fan out requests in ~100
MicroServices
Fan out
IPC 2.0 .. the next frontier
@NiteshKant
Netflix IPC Stack (1.0)
A
p
a
c
h
e
H
T
T
P
C
l
i
e
n
t
Eureka (Service Registry)
Server (Karyon)
Apache
Tomcat
Client
H
y
s
t
r
i
x
E
V
C
a
c
h
e
Ribbon
Load
Balancing
Eureka
Integration
Metrics
(Servo)
Bootstrapping (Governator)
Metrics (Servo)
Admin ConsoleHTTP
Eureka Integration
Registration
Fetch Registry
Netflix IPC Stack (2.0)
Client (Ribbon 2.0)
Eureka (Service Registry)
Server (Karyon)
Ribbon Transport
Load
Balancing
Eureka
Integration
Metrics
(Servo)
Bootstrapping (Governator)
Metrics (Servo)
Admin Console
HTTP
Eureka Integration
Registration
Fetch Registry
Ribbon
Hystrix
EVCache
R
x
N
e
t
t
y
RxNetty
UDP
TCP
WebSockets
SSE
Synchronous Applications
Tomcat
Connector Application code Hystrix
Apache HTTP
Client
Conn 1Thread 1 Thread 1’
Thread
1*
Thread 1’
Conn 2Thread 2 Thread 2’
Thread
2*
Thread 2’
Conn nThread n Thread n’
Thread
n*
Thread n’
……....
*If there isn’t any application driven thread change
Synchronous Applications
Tomcat
Connector Application code Hystrix
Apache HTTP
Client
Conn 1Thread 1 Thread 1’
Thread
1*
Thread 1’
Conn 2Thread 2 Thread 2’
Thread
2*
Thread 2’
Conn nThread n Thread n’
Thread
n*
Thread n’
……....
Large # of connections / Large # of external dependencies => tons of threads.
*If there isn’t any application driven thread change
Asynchronous applications
Application code RxNettyHystrixRxNetty
Eventloop 1 Eventloop 4
Eventloop 1*
Eventloop 4*
*If there isn’t any application driven thread change
“N” connections per eventloop
Request processing in Eventloop
Hystrix used for throttling not for achieving asynchronicity.
Eventloops are shared between In & OUT
Asynchronous Applications
Application code RxNettyHystrixRxNetty
Eventloop 1 Eventloop 4
Eventloop 1*
Eventloop 4*
*If there isn’t any application driven thread change
Eventloop 2 Eventloop 3
Eventloop 1*
Eventloop 4*
……....
Eventloop 4 Eventloop 1
Eventloop 1*
Eventloop 4*
# of processors => # of eventloops. No dependence on # of connections
Takeaway
MicroServices is a better architecture
compared to Monolithic Apps
However
Beaware of the challenges - Use Best Practices
and battle-tested OSS components
http://netflix.github.co
1 of 31

More Related Content

What's hot(20)

A Pattern Language for MicroservicesA Pattern Language for Microservices
A Pattern Language for Microservices
Chris Richardson2.5K views
Microservice architectureMicroservice architecture
Microservice architecture
Žilvinas Kuusas2.7K views
Microservice ArchitectureMicroservice Architecture
Microservice Architecture
Nguyen Tung7.1K views
TerraformTerraform
Terraform
Otto Jongerius777 views
Why Microservice Why Microservice
Why Microservice
Kelvin Yeung496 views
Micro services ArchitectureMicro services Architecture
Micro services Architecture
Araf Karsh Hamid3.1K views
Introduction to microservicesIntroduction to microservices
Introduction to microservices
Anil Allewar1.1K views
Observability in the world of microservicesObservability in the world of microservices
Observability in the world of microservices
Chandresh Pancholi343 views
Introduction to MicroservicesIntroduction to Microservices
Introduction to Microservices
Amazon Web Services20.2K views

Viewers also liked(17)

Similar to MicroServices at Netflix - challenges of scale(20)

Building Your Docker Tech StackBuilding Your Docker Tech Stack
Building Your Docker Tech Stack
Bret Fisher48 views
OpenStack IntroductionOpenStack Introduction
OpenStack Introduction
openstackindia49.5K views
Azure Service Fabric: The road ahead for microservicesAzure Service Fabric: The road ahead for microservices
Azure Service Fabric: The road ahead for microservices
Microsoft Tech Community2.5K views
Tungsten Fabric OverviewTungsten Fabric Overview
Tungsten Fabric Overview
Michelle Holley2K views
Devoxx university - Kafka de haut en basDevoxx university - Kafka de haut en bas
Devoxx university - Kafka de haut en bas
Florent Ramiere1.9K views

MicroServices at Netflix - challenges of scale

  • 1. MicroServices at NETFLIX Best Practices & Tools of the trade Sudhir Tonse Manager, Cloud Platform @stonse http://linkedin.com/in/sudhirtonse Nitesh Kant Platform Architect @NiteshKant http://linkedin.com/in/niteshkant
  • 2. ● Old DataCenter (2008) ● Everything in one WebApp (.war) ● AWS Cloud (2012) ● 100s of Fine Grained Services
  • 3. Positives ● Isolation brings better Availability* ● Independent Speed of Delivery (by different teams) ● Decentralized Governance (DevOps)
  • 4. Challenges Distributed Systems are inherently Complex Operational Overhead (100s of services; DevOps model absolutely required) Service Interface Versioning, Mismatches? Testing (Need the entire ecosystem to test) Fan out of Requests -> Increases n/w traffic
  • 5. Claim MicroServices increase your overall availability
  • 7. One missing “;” brought down ALL of Netflix
  • 9. Uptime SLA Assume a Monolithic Service with 99.99% availability What if you have ... ~30 Microservices (each with 99.99% SLA)??
  • 10. Reality One rogue (dependency) micro service CAN bring your whole site down!
  • 11. How?
  • 13. Combined Effective SLA (Availability) == 2 HOURS of downtime per month == 99.7 % uptime!!
  • 14. But what if I want better? MicroServices does not automatically mean better Availability - Unless you have Fault Tolerant Architecture
  • 15. Guard your Service! Use Hystrix (http://github.com/netflix/hystrix)
  • 16. Service Discovery & Loadbalancers Choice 1. Central Loadbalancer? (H/W or S/W) OR 2. Client based S/W Loadbalancer?
  • 17. Client based Smart Loadbalancer Use Ribbon (http://github.com/netflix/ribbon)
  • 18. Tools of the Trade OR
  • 21. Chattiness (and Fan Out) ~2 Billion Requests per day on Edge Service … Results in ~20 Billion Fan out requests in ~100 MicroServices
  • 23. IPC 2.0 .. the next frontier @NiteshKant
  • 24. Netflix IPC Stack (1.0) A p a c h e H T T P C l i e n t Eureka (Service Registry) Server (Karyon) Apache Tomcat Client H y s t r i x E V C a c h e Ribbon Load Balancing Eureka Integration Metrics (Servo) Bootstrapping (Governator) Metrics (Servo) Admin ConsoleHTTP Eureka Integration Registration Fetch Registry
  • 25. Netflix IPC Stack (2.0) Client (Ribbon 2.0) Eureka (Service Registry) Server (Karyon) Ribbon Transport Load Balancing Eureka Integration Metrics (Servo) Bootstrapping (Governator) Metrics (Servo) Admin Console HTTP Eureka Integration Registration Fetch Registry Ribbon Hystrix EVCache R x N e t t y RxNetty UDP TCP WebSockets SSE
  • 26. Synchronous Applications Tomcat Connector Application code Hystrix Apache HTTP Client Conn 1Thread 1 Thread 1’ Thread 1* Thread 1’ Conn 2Thread 2 Thread 2’ Thread 2* Thread 2’ Conn nThread n Thread n’ Thread n* Thread n’ …….... *If there isn’t any application driven thread change
  • 27. Synchronous Applications Tomcat Connector Application code Hystrix Apache HTTP Client Conn 1Thread 1 Thread 1’ Thread 1* Thread 1’ Conn 2Thread 2 Thread 2’ Thread 2* Thread 2’ Conn nThread n Thread n’ Thread n* Thread n’ …….... Large # of connections / Large # of external dependencies => tons of threads. *If there isn’t any application driven thread change
  • 28. Asynchronous applications Application code RxNettyHystrixRxNetty Eventloop 1 Eventloop 4 Eventloop 1* Eventloop 4* *If there isn’t any application driven thread change “N” connections per eventloop Request processing in Eventloop Hystrix used for throttling not for achieving asynchronicity. Eventloops are shared between In & OUT
  • 29. Asynchronous Applications Application code RxNettyHystrixRxNetty Eventloop 1 Eventloop 4 Eventloop 1* Eventloop 4* *If there isn’t any application driven thread change Eventloop 2 Eventloop 3 Eventloop 1* Eventloop 4* …….... Eventloop 4 Eventloop 1 Eventloop 1* Eventloop 4* # of processors => # of eventloops. No dependence on # of connections
  • 30. Takeaway MicroServices is a better architecture compared to Monolithic Apps However Beaware of the challenges - Use Best Practices and battle-tested OSS components