Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
A Cloud Gateway -
A Large Scale Company’s First Line
of Defense
Mikey Cohen
Manager - Edge Gateway
Netflix
Today, more than 36% of
North America’s internet
traffic is controlled by
systems in the Amazon
Cloud
Global Streaming of TV Shows and
Movies
Nearly 70 Million Subscribers
In over 80 Countries
Netflix accounts for over 36% of
Downstream Traffic in North
America
From the Internet to Services in the Cloud
Gateway
Gateway
?????
Origin (API)
Origin (API)
API
Origin (API)
Origin (API)
W...
Our Edge Gateway @ Netflix
Handles most netflix.com hosts
Over 20 production Zuul clusters
~ 50 elbs
Gateway handles ~10 o...
Netflix Gateway Scale
Tens of billions of requests per day
3 AWS regions
Over 1000 device types
Hundreds of permutations o...
Success
Evolution
Scale
Failure
Our Journey
So What!? - Change your perspective!!
Traditional Cloud Proxy Mission
Simple static rule-based routing
API portal
Request authentication
Throttling - request ca...
The Gateway - a grown-up proxy!
●Dynamic routing
●Deep Insights
●Load balancing
●Availability focused
●Service protection
...
Evolving to a Gateway
Netflix’s Public API
Late 2008
Mashery
Datacenter
Streaming Devices using public API
Early Streaming Devices - 2009
Windows Media Center
XBox
PS3
Migration to AWS
2010
Sonoa / Apigee proxy
Device traffic, not public
Controlling DC -> cloud
migration
Running in AWS
Und...
Streaming Success
2011
Chaos
Complexity
Failure
Success
Leveraging
Cloud benefits
Anti-patterns of most cloud proxies
Static configurations
Service push needed to
change behavior
Limited range of
function...
Zuul Created
2012
Dynamically injected and compiled filters
Manipulate requests and responses
Headers / Body / etc
Change ...
Zuul - A Victim of Success
Easy and convenient
Instant results
High adoption
Happy customers
Business logic in proxy
Affec...
Creating a Gateway
Strategy
Principles of Netflix’s Gateway Strategy
Creative Routing
Dynamic Routing
Delivery Focused
Traffic Shaping
React Fast
Insi...
Creative Routing - Subclusters with Purpose
Gateway
Gateway
Gateway
Origin (API)
v1
v2
test
debug
Instrumented
squeeze
“st...
Red / Green Deployments
Gateway
Gateway
Gateway
Origin (API)
v1
v2
test
debug
canary
Instrumented
squeeze
“sticky”
canaryb...
Developer Test Branches
Gateway
Gateway
Gateway
Origin (API)
v1
v2
test
debug
canary
Instrumented
squeeze
“sticky”
canaryb...
Instrumented Clusters
Gateway
Gateway
Gateway
Origin (API)
v1
v2
test
debug
canary
Instrumented
squeeze
“sticky”
canarybas...
Squeeze Testing
Gateway
Gateway
Gateway
Origin (API)
v1
v2
test
debug
canary
Instrumented
squeeze
“sticky”
canarybaseline
...
Targeted Routing
Gateway
Gateway
Gateway
Origin (API)
v1
v2
test
debug
canary
Instrumented
squeeze
“sticky”
canarybaseline...
Service “Canarying”
Gateway
Gateway
Gateway
Origin (API)
v1
v2
test
debug
canary
Instrumented
squeeze
“sticky”
canarybasel...
“Sticky” Canary
Gateway
Gateway
Gateway
Origin (API)
v1
v2
test
debug
canary
Instrumented
squeeze
“sticky”
canarybaseline
...
Failure Injection Testing
Gateway
Gateway
Gateway
Origin (API)
v1
v2
test
debug
Instrumented
squeeze
“sticky”
canarybaseli...
Degraded Experience Testing
Gateway
Gateway
Gateway
Origin (API)
v1
v2
test
debug
Instrumented
squeeze
“sticky”
canarybase...
Traffic Shaping
A Global Cloud Deployment
Persistence Tier
Business
services Tier
Presentation
Tier
Network Tier
Websites
API
Proxy
DB
Per...
Global Cloud Routing
Persistence Tier
Business
services Tier
Presentation
Tier
Network Tier
Websites
API
Proxy
DB
Persiste...
A Failing region
Persistence Tier
Business
services Tier
Presentation
Tier
Network Tier
Websites
API
Proxy
DB
Persistence ...
Gateway routing to other regions
Persistence Tier
Business
services Tier
Presentation
Tier
Network Tier
Websites
API
Proxy...
Attack prevention
Gateway
Gateway
Gateway
Origin (API)
Origin (API)
API
Origin (API)
Origin (API)
Website
Smart Load Balancing
Gateway
Gateway
Gateway
Origin (API)
Smart Load Balancing - Bad Nodes
Gateway
Gateway
Gateway
Origin (API)
Gateway Backoff and Blacklists Bad Nodes
Gateway
Gateway
Gateway
Origin (API)
Zone Failure - Blacklist the Zone automatically
Gateway
Gateway
Gateway
Origin (API)
React Quickly - Runtime Filter changes
Gateway
Gateway
Gateway
Origin (API)
Origin (API)
API
Origin (API)
Origin (API)
Web...
A Room with a View - Insights
Gateway
Gateway
Gateway
Origin (API)
Origin (API)
API
Origin (API)
Origin (API)
Website
Insi...
What’s Next for Netflix’s Gateway?
Gateway as a service
Self-service dynamic routing / route validation
Control APIs for s...
Top Ten Lessons Learned
Build for handling
Failures
Expect the Unexpected
Using Routing Creatively
Shard to Reduce Blast
Radius
Devices are Weird
Protocols are Weird
Devices are Forever
Protocols are Forever
It will be built “wrong”
Keep Business Logic out
of your Gateway
For More Info...
Zuul OSS
Netflix Tech Blog
RxNetty
Jobs
Rethinking Cloud Proxies
Upcoming SlideShare
Loading in …5
×

Rethinking Cloud Proxies

8,562 views

Published on

An edge gateway is an essential piece of infrastructure for large scale cloud based services. This presentation details the purpose, benefits and use cases for an edge gateway to provide security, traffic management and cloud cross region resiliency. How a gateway can be used to enhance continuous deployment, and help testing of new service versions and get service insights and more are discussed. Philosophical and architectural approaches to what belongs in a gateway vs what should be in services will be discussed. Real examples of how gateway services are used in front of nearly all of Netflix's consumer facing traffic will show how gateway infrastructure is used in real highly available, massive scale services.

Published in: Engineering

Rethinking Cloud Proxies

  1. 1. A Cloud Gateway - A Large Scale Company’s First Line of Defense Mikey Cohen Manager - Edge Gateway Netflix
  2. 2. Today, more than 36% of North America’s internet traffic is controlled by systems in the Amazon Cloud
  3. 3. Global Streaming of TV Shows and Movies
  4. 4. Nearly 70 Million Subscribers In over 80 Countries
  5. 5. Netflix accounts for over 36% of Downstream Traffic in North America
  6. 6. From the Internet to Services in the Cloud Gateway Gateway ????? Origin (API) Origin (API) API Origin (API) Origin (API) Website
  7. 7. Our Edge Gateway @ Netflix Handles most netflix.com hosts Over 20 production Zuul clusters ~ 50 elbs Gateway handles ~10 origin services
  8. 8. Netflix Gateway Scale Tens of billions of requests per day 3 AWS regions Over 1000 device types Hundreds of permutations of protocols and device versions
  9. 9. Success Evolution Scale Failure Our Journey
  10. 10. So What!? - Change your perspective!!
  11. 11. Traditional Cloud Proxy Mission Simple static rule-based routing API portal Request authentication Throttling - request caps Monitoring Caching
  12. 12. The Gateway - a grown-up proxy! ●Dynamic routing ●Deep Insights ●Load balancing ●Availability focused ●Service protection ●Quality assurance tool
  13. 13. Evolving to a Gateway
  14. 14. Netflix’s Public API Late 2008 Mashery Datacenter
  15. 15. Streaming Devices using public API Early Streaming Devices - 2009 Windows Media Center XBox PS3
  16. 16. Migration to AWS 2010 Sonoa / Apigee proxy Device traffic, not public Controlling DC -> cloud migration Running in AWS Under Netflix control
  17. 17. Streaming Success 2011 Chaos Complexity Failure Success Leveraging Cloud benefits
  18. 18. Anti-patterns of most cloud proxies Static configurations Service push needed to change behavior Limited range of functionality Limited to HTTP
  19. 19. Zuul Created 2012 Dynamically injected and compiled filters Manipulate requests and responses Headers / Body / etc Change routing Add metrics and other functions Built on Netflix’s OSS stack Open Sourced
  20. 20. Zuul - A Victim of Success Easy and convenient Instant results High adoption Happy customers Business logic in proxy Affects system resiliency Zuul team in critical path
  21. 21. Creating a Gateway Strategy
  22. 22. Principles of Netflix’s Gateway Strategy Creative Routing Dynamic Routing Delivery Focused Traffic Shaping React Fast Insights
  23. 23. Creative Routing - Subclusters with Purpose Gateway Gateway Gateway Origin (API) v1 v2 test debug Instrumented squeeze “sticky” canarybaseline “sticky” baseline v1 v2 test debug baseline canary “sticky” canary “sticky” baselineFIT Instrumented squeeze
  24. 24. Red / Green Deployments Gateway Gateway Gateway Origin (API) v1 v2 test debug canary Instrumented squeeze “sticky” canarybaseline “sticky” baseline v1 v2 test debug baseline canary “sticky” canary “sticky” baselineFIT Instrumented Instrumented squeeze squeeze
  25. 25. Developer Test Branches Gateway Gateway Gateway Origin (API) v1 v2 test debug canary Instrumented squeeze “sticky” canarybaseline “sticky” baseline v1 v2 test debug baseline canary “sticky” canary “sticky” baselineFIT Instrumented Instrumented squeeze squeeze
  26. 26. Instrumented Clusters Gateway Gateway Gateway Origin (API) v1 v2 test debug canary Instrumented squeeze “sticky” canarybaseline “sticky” baseline v1 v2 test debug baseline canary “sticky” canary “sticky” baselineFIT Instrumented squeeze squeeze
  27. 27. Squeeze Testing Gateway Gateway Gateway Origin (API) v1 v2 test debug canary Instrumented squeeze “sticky” canarybaseline “sticky” baseline v1 v2 test debug baseline canary “sticky” canary “sticky” baselineFIT Instrumented squeeze
  28. 28. Targeted Routing Gateway Gateway Gateway Origin (API) v1 v2 test debug canary Instrumented squeeze “sticky” canarybaseline “sticky” baseline v1 v2 test debu g baseline canary “sticky” canary “sticky” baselineFIT Instrumented squeeze
  29. 29. Service “Canarying” Gateway Gateway Gateway Origin (API) v1 v2 test debug canary Instrumented squeeze “sticky” canarybaseline “sticky” baseline v1 v2 test debug baseline canary “sticky” canary “sticky” baselineFIT Instrumented squeeze squeeze
  30. 30. “Sticky” Canary Gateway Gateway Gateway Origin (API) v1 v2 test debug canary Instrumented squeeze “sticky” canarybaseline “sticky” baseline v1 v2 test debug baseline canary “sticky” canary “sticky” baselineFIT Instrumented squeeze squeeze
  31. 31. Failure Injection Testing Gateway Gateway Gateway Origin (API) v1 v2 test debug Instrumented squeeze “sticky” canarybaseline “sticky” baseline v1 v2 test debug baseline canary “sticky” canary “sticky” baselineFIT Instrumented squeeze squeeze
  32. 32. Degraded Experience Testing Gateway Gateway Gateway Origin (API) v1 v2 test debug Instrumented squeeze “sticky” canarybaseline “sticky” baseline v1 v2 test debug baseline canary “sticky” canary “sticky” baselineFIT Instrumented squeeze squeeze
  33. 33. Traffic Shaping
  34. 34. A Global Cloud Deployment Persistence Tier Business services Tier Presentation Tier Network Tier Websites API Proxy DB Persistence Tier Business services Tier Presentation Tier Network Tier Websites API Proxy DB Persistence Tier Business services Tier Presentation Tier Network Tier Websites API Proxy DB
  35. 35. Global Cloud Routing Persistence Tier Business services Tier Presentation Tier Network Tier Websites API Proxy DB Persistence Tier Business services Tier Presentation Tier Network Tier Websites API Proxy DB Persistence Tier Business services Tier Presentation Tier Network Tier Websites API Proxy DB
  36. 36. A Failing region Persistence Tier Business services Tier Presentation Tier Network Tier Websites API Proxy DB Persistence Tier Business services Tier Presentation Tier Network Tier Websites API Proxy DB Persistence Tier Business services Tier Presentation Tier Network Tier Websites API Proxy DB
  37. 37. Gateway routing to other regions Persistence Tier Business services Tier Presentation Tier Network Tier Websites API Proxy DB Persistence Tier Business services Tier Presentation Tier Network Tier Websites API Proxy DB Persistence Tier Business services Tier Presentation Tier Network Tier Websites API Proxy DB
  38. 38. Attack prevention Gateway Gateway Gateway Origin (API) Origin (API) API Origin (API) Origin (API) Website
  39. 39. Smart Load Balancing Gateway Gateway Gateway Origin (API)
  40. 40. Smart Load Balancing - Bad Nodes Gateway Gateway Gateway Origin (API)
  41. 41. Gateway Backoff and Blacklists Bad Nodes Gateway Gateway Gateway Origin (API)
  42. 42. Zone Failure - Blacklist the Zone automatically Gateway Gateway Gateway Origin (API)
  43. 43. React Quickly - Runtime Filter changes Gateway Gateway Gateway Origin (API) Origin (API) API Origin (API) Origin (API) Website Runtime Policy Injection
  44. 44. A Room with a View - Insights Gateway Gateway Gateway Origin (API) Origin (API) API Origin (API) Origin (API) Website Insights
  45. 45. What’s Next for Netflix’s Gateway? Gateway as a service Self-service dynamic routing / route validation Control APIs for special routing functions Netty Based Zuul (using RxNetty) Handling persistent connections non-blocking, async Transport protocol agnostic routing Reactive Socket http://reactivesocket.io/
  46. 46. Top Ten Lessons Learned
  47. 47. Build for handling Failures
  48. 48. Expect the Unexpected
  49. 49. Using Routing Creatively
  50. 50. Shard to Reduce Blast Radius
  51. 51. Devices are Weird Protocols are Weird
  52. 52. Devices are Forever Protocols are Forever
  53. 53. It will be built “wrong”
  54. 54. Keep Business Logic out of your Gateway
  55. 55. For More Info... Zuul OSS Netflix Tech Blog RxNetty Jobs

×