Why do weneed ludicruous scale?
- Cashless transactions have gone to 25% from 5% in a matter of
few months!
- IRCTC bookings have grown from 29 tickets per day to 13L per
day !
- IHS forecasts that the IoT market will grow from an installed
base of 15.4 billion devices in 2015 to 30.7 billion devices in
2020 and 75.4 billion in 2025 => Huge scalability requirements
on IOT applications
3.
Load Balancer Scalability– New Considerations
• SSL/TLS traffic seeing explosive growth
• Performance myth: Ultra expensive and inflexible
hardware appliances the only solution
• Moore’s law: advances in Intel x86 servers –
processors, memory, and networking
• Crypto advances: RSA 2K vs. ECC encryption keys
• Software-defined architectural advances enable
significant elasticity
Hierarchical Load Balancers
Concept:
•Chaining of load balancing services
• Tier 1 – Layer 4 (TCP/UDP) load balancing
• Tier 2 – Layer 7 load balancing
Pros:
• Simplest approach
• May suffice for small scale environments
Cons:
• Limited by performance of Tier-1 LB
Users
Tier 1
Load Balancer
Tier 2
Load Balancer
Application
Instances
6.
DNS + ProxyLoad Balancers
Concept:
• DNS redirections with server mirroring
• Dynamic mapping of hostname to IP
addresses
Pros:
• Easy to configure
• Scales well
Cons:
• DNS caches can become stale
Users DNS
Load Balancer
Application
Instances
IP1
IP2
IP3
IP4 IP1
IP2
IP3
IP4
7.
Route injection/Anycast LoadBalancers
Concept:
• DNS resolves to single IP
• Upstream router holds IP address
• Router performs flow-based ECMP to
next hop load balancers
Pros:
• Can scale significantly – most routers
support at least 64 next hops
Cons:
• Access to an upstream router is needed
Users
Router
Load Balancer
Application
Instances
8.
Legacy 90s Arch,
Boxapproach
• Proprietary Hardware
• Manage Each Device
• No Automation
• No Telemetry
• Static Capacity
The State of Load Balancing/Application Delivery
WebScale computing is here but load balancing is still a bottleneck!
Takeaways from
AWS/FB/Microsoft
• Commodity x86
• Manage As One
• Highly Automated
• Built-In Telemetry
• Elastic
Flexible, Fluid CapacityRigid
Legacy
ADC/LBs WEB SCALE TECH
Load
Balancers
9.
Virtualized Containers PublicCloudCompute ComputeCompute
Modern Distributed Architecture
Separate Control and Dataplane
Manage as one, not many devices
Controller
Load Balancers
Management Plane: UI/CLI
Data Plane: LB
10.
Virtualized Containers PublicCloudCompute
Modern Distributed Architecture
Separate Control and Dataplane
Manage as one, not many devices
Controller
Load Balancers
11.
Modern Distributed Architecture
SeparateControl and Dataplane
Manage as one, not many devices
Load Balancers
Bare Metal Virtualized Containers Public Cloud
Controller
MESOS
Management & Orchestration
REST API
Multi-Cloud Fabric
Single solution, any environment
Automation
Highly programmable, Plug-n-Play
Built-In Visibility & Analytics
Actionable insights key to automation
Innovation
12.
1 Million TPSon Google Compute Engine - Setup
Avi Networks – Elastic Application Services Fabric
320x Test
Clients
40x Avi Service Engines
(Load Balancers)
ab ab
ab
n1-highcpu-16
ab ab
ab
n1-highcpu-16
ab ab
ab
n1-highcpu-16
GCP
Router
Controller
ab ab
ab
n1-highcpu-16
Application
Instances
13.
Key Stats
- Totalcost for setup in Google Compute < $50
- SSL TPS – 0 to 1 million TPS in a few seconds
- Dataplane: 40 VM instances with 32 hyperthreaded cores each
- Traffic generators – 320 VM instances on 16 hyperthreaded cores
each
14.
• Setup inGoogle Compute
• Bootstrap instance - 1 g1-small instance
• Avi Controller - 1 n1-standard-4 instance
• Avi Service Engines (load balancers) - 40 n1-highcpu-32 instances
• Pool server - 1 g1-small instance
• Test clients (load/traffic generators) - 320 n1-highcpu-16 instances
• Running the test
• https://github.com/avinetworks/avi-test-scripts : This public repo has all the scripts
required for anyone to perform the scalability test
Test setup and methodology
15.
Avi Networks Proprietaryand Confidential 2017
Scale Performance Up and Out
Managed as One Elastic Load Balancer Fabric
• 1 LB, 1 core
• 5 Gbps
• 2,500 SSL TPS
• 1 LB, 24 cores (2 Sockets)
• 20 Gbps (10 Gbps NICs)
• 60,000 SSL TPS
SCALE-UP
More cores & IO
LB performance scales with CPUs
(Moore’s Law) & IO (40 Gbps NICs)
• 1 LB, 2 core
• 10 Gbps
• 5,000 SSL TPS
Single App Perf
• 640 Gbps
• 1.9M SSL TPS
Performance
• 4 Tbps
• 12M SSL TPS
Scale to 200 LBs
• 2 LB, 1 core
• 10 Gbps
• 5,000 SSL TPS
SCALE-OUT
More LBs
Fabric performance scales
horizontally with LBs
Centralized
API
Management
Monitoring
16.
Beyond Google Compute;Any Data Center or Public Cloud
Clients Load Balancers
Controller
Application
Instances
GCP
Router
The New Rulesof Elastic, Cost-effective Load Balancing
1 Take advantage of WebScale architectures
2 Use analytics-driven decisions for on-demand elasticity
Automate L4 – L7 services with APIs3
Leverage load balancers for application intelligence4
Eliminate hardware overprovisioning5
#3 Customer: I have hundreds of servers in my data center. With Chef, Puppet, Ansible, I can turn a couple of racks in to webserver within 5 mins, and turn another 2 racks in to app servers.
#4 Customer: I have hundreds of servers in my data center. With Chef, Puppet, Ansible, I can turn a couple of racks in to webserver within 5 mins, and turn another 2 racks in to app servers.
#5 Customer: I have hundreds of servers in my data center. With Chef, Puppet, Ansible, I can turn a couple of racks in to webserver within 5 mins, and turn another 2 racks in to app servers.
#6 Customer: I have hundreds of servers in my data center. With Chef, Puppet, Ansible, I can turn a couple of racks in to webserver within 5 mins, and turn another 2 racks in to app servers.
#7 Customer: I have hundreds of servers in my data center. With Chef, Puppet, Ansible, I can turn a couple of racks in to webserver within 5 mins, and turn another 2 racks in to app servers.
#8 Customer: I have hundreds of servers in my data center. With Chef, Puppet, Ansible, I can turn a couple of racks in to webserver within 5 mins, and turn another 2 racks in to app servers.
#9 Customer: I have hundreds of servers in my data center. With Chef, Puppet, Ansible, I can turn a couple of racks in to webserver within 5 mins, and turn another 2 racks in to app servers.
#10 Load balancers sit in the network today, in front of every business critical application in your environment. These are largely standard x86 servers in a proprietary box, with each box containing it’s own management plane and data plane – so your teams manage each one by one as an independent appliance.
Avi’s architecture combined the management plane into a centralized controller which allows you to manage the dataplane – what we call a service engine – as an elastic fabric that can grow and shrink based on capacity needs, without increasing the number of management points.
#11 As your infrastructure goes from bare metal to virtual to containers and public cloud – you are now able to spin up the service engines as bare metal appliances on standard x86 servers, or VMs, or containers, or in the public cloud depending on the application needs you are trying to meet.
The bare metal deployments offer an easy transition from an existing hardware appliance based environment to a software-defined env while ensuring that future transition to virtual, container or public cloud environments is smooth.
Across all of these environments, the controller offers a single point of management and monitoring.
#12 As your infrastructure goes from bare metal to virtual to containers and public cloud – you are now able to spin up the service engines as bare metal appliances on standard x86 servers, or VMs, or containers, or in the public cloud depending on the application needs you are trying to meet.
The bare metal deployments offer an easy transition from an existing hardware appliance based environment to a software-defined env while ensuring that future transition to virtual, container or public cloud environments is smooth.
Across all of these environments, the controller offers a single point of management and monitoring.
The controller is pre-integrated with management and orchestration platforms like vCenter, SDN controllers, container cluster managers like Mesos and Kubernetes, as well as public clouds like AWS. This allows a completely automated experience where Service Engines can be spun up or down and connected to networks automatically as need.
Finally, given the strategic location of load balancers, they are best positioned to provide visibility into application usage and performance. So we built 100s of virtual probes in the Service Engines which can send this real-time telemetry on app performance back to the controller. The controller has a real-time analytics engine which process billions of data points to provide insights on application performance, usage, end-usage experience, security posture, DDoS, etc. Now operations team can track this data on applications in real time for any application without needing monitoring fabrics, taps, or external network based app performance monitoring solutions.
#13 Customer: I have hundreds of servers in my data center. With Chef, Puppet, Ansible, I can turn a couple of racks in to webserver within 5 mins, and turn another 2 racks in to app servers.