MicroServices at Netflix - challenges of scale

38,007 views
37,230 views

Published on

MicroServices has caught on as the design pattern of choice for many companies at scale. While MicroServices and SOA in general have many positives compared to Monolithic apps, it does come with its own challenges - especially when running at scale. These slides were for a 15 min Meetup talk hosted at Cisco

Published in: Engineering
1 Comment
194 Likes
Statistics
Notes
No Downloads
Views
Total views
38,007
On SlideShare
0
From Embeds
0
Number of Embeds
426
Actions
Shares
0
Downloads
1,380
Comments
1
Likes
194
Embeds 0
No embeds

No notes for slide

MicroServices at Netflix - challenges of scale

  1. 1. MicroServices at NETFLIX Best Practices & Tools of the trade Sudhir Tonse Manager, Cloud Platform @stonse http://linkedin.com/in/sudhirtonse Nitesh Kant Platform Architect @NiteshKant http://linkedin.com/in/niteshkant
  2. 2. ● Old DataCenter (2008) ● Everything in one WebApp (.war) ● AWS Cloud (2012) ● 100s of Fine Grained Services
  3. 3. Positives ● Isolation brings better Availability* ● Independent Speed of Delivery (by different teams) ● Decentralized Governance (DevOps)
  4. 4. Challenges Distributed Systems are inherently Complex Operational Overhead (100s of services; DevOps model absolutely required) Service Interface Versioning, Mismatches? Testing (Need the entire ecosystem to test) Fan out of Requests -> Increases n/w traffic
  5. 5. Claim MicroServices increase your overall availability
  6. 6. True? Yes … but wait!
  7. 7. One missing “;” brought down ALL of Netflix
  8. 8. Introduced MicroServices ...
  9. 9. Uptime SLA Assume a Monolithic Service with 99.99% availability What if you have ... ~30 Microservices (each with 99.99% SLA)??
  10. 10. Reality One rogue (dependency) micro service CAN bring your whole site down!
  11. 11. How?
  12. 12. Service Hosed!!
  13. 13. Combined Effective SLA (Availability) == 2 HOURS of downtime per month == 99.7 % uptime!!
  14. 14. But what if I want better? MicroServices does not automatically mean better Availability - Unless you have Fault Tolerant Architecture
  15. 15. Guard your Service! Use Hystrix (http://github.com/netflix/hystrix)
  16. 16. Service Discovery & Loadbalancers Choice 1. Central Loadbalancer? (H/W or S/W) OR 2. Client based S/W Loadbalancer?
  17. 17. Client based Smart Loadbalancer Use Ribbon (http://github.com/netflix/ribbon)
  18. 18. Tools of the Trade OR
  19. 19. Service Dependency View
  20. 20. Distributed Tracing
  21. 21. Chattiness (and Fan Out) ~2 Billion Requests per day on Edge Service … Results in ~20 Billion Fan out requests in ~100 MicroServices
  22. 22. Fan out
  23. 23. IPC 2.0 .. the next frontier @NiteshKant
  24. 24. Netflix IPC Stack (1.0) A p a c h e H T T P C l i e n t Eureka (Service Registry) Server (Karyon) Apache Tomcat Client H y s t r i x E V C a c h e Ribbon Load Balancing Eureka Integration Metrics (Servo) Bootstrapping (Governator) Metrics (Servo) Admin ConsoleHTTP Eureka Integration Registration Fetch Registry
  25. 25. Netflix IPC Stack (2.0) Client (Ribbon 2.0) Eureka (Service Registry) Server (Karyon) Ribbon Transport Load Balancing Eureka Integration Metrics (Servo) Bootstrapping (Governator) Metrics (Servo) Admin Console HTTP Eureka Integration Registration Fetch Registry Ribbon Hystrix EVCache R x N e t t y RxNetty UDP TCP WebSockets SSE
  26. 26. Synchronous Applications Tomcat Connector Application code Hystrix Apache HTTP Client Conn 1Thread 1 Thread 1’ Thread 1* Thread 1’ Conn 2Thread 2 Thread 2’ Thread 2* Thread 2’ Conn nThread n Thread n’ Thread n* Thread n’ …….... *If there isn’t any application driven thread change
  27. 27. Synchronous Applications Tomcat Connector Application code Hystrix Apache HTTP Client Conn 1Thread 1 Thread 1’ Thread 1* Thread 1’ Conn 2Thread 2 Thread 2’ Thread 2* Thread 2’ Conn nThread n Thread n’ Thread n* Thread n’ …….... Large # of connections / Large # of external dependencies => tons of threads. *If there isn’t any application driven thread change
  28. 28. Asynchronous applications Application code RxNettyHystrixRxNetty Eventloop 1 Eventloop 4 Eventloop 1* Eventloop 4* *If there isn’t any application driven thread change “N” connections per eventloop Request processing in Eventloop Hystrix used for throttling not for achieving asynchronicity. Eventloops are shared between In & OUT
  29. 29. Asynchronous Applications Application code RxNettyHystrixRxNetty Eventloop 1 Eventloop 4 Eventloop 1* Eventloop 4* *If there isn’t any application driven thread change Eventloop 2 Eventloop 3 Eventloop 1* Eventloop 4* …….... Eventloop 4 Eventloop 1 Eventloop 1* Eventloop 4* # of processors => # of eventloops. No dependence on # of connections
  30. 30. Takeaway MicroServices is a better architecture compared to Monolithic Apps However Beaware of the challenges - Use Best Practices and battle-tested OSS components
  31. 31. http://netflix.github.co

×