Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Loadbalancing In-depth study for scale @ 80K TPS

555 views

Published on

This PPT is 60 days research on Software loadbalancing tech at scale.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Loadbalancing In-depth study for scale @ 80K TPS

  1. 1. In-Depth Study to scale @ 80K TPS Load Balancing
  2. 2. Starting ❖ About Me : Engineering @ Paytm . Working on this problem for 2 months ❖ Problem : Identifying Entry Solution for 80K TPS, 20K active transacting connections , while keeping latency loss < 2 ms ❖ Outline : Evaluation and Perf test of all sorts of LB, Routers and classify them ❖ Not Covering : After Every solution, things which are not covered
  3. 3. Evaluation criteria ▸High Availability ( HA ) : Unaffected service during any predefined number of simultaneous failures ▸Balancing strategies : Round robin, least connection, weighted . ▸Health Checks ▸Extensibility : C/Lua Lib support ▸Monitoring and Manageability ▸Perf
  4. 4. Categories of LB ❖ DNS Based ❖ Software & Hardware Based ❖ Layer 3/4 Proxying ❖ Layer 7 Proxying ❖ Routing at L4 cue 5
  5. 5. DNS Based ❖ Multiple IPs : Round Robin ❖ No Concept of HA, Monitoring, health checks ❖ Health Checks, Routing policies are available via custom solutions
  6. 6. Layer 3/4 Load Balancing ❖ Hardware Based LBs mostly. ❖ No well known Prog. which runs in Kernel Space. ❖ Software Based User Space Proxy based LBs examples are Haproxy and Nginx
  7. 7. Haproxy Monitoring ❖ Socket Based Stats are available with ~60 CSV ❖ Web Interface
  8. 8. Benchmarking Env cue 10
  9. 9. Issues with Haproxy L4 ❖ Scale Constraint ❖ Only CPU. Cores 100% with Load(1 min) as 64 ❖ Benchmark ❖ 20K TPS , keep-alive off and 100ms backend latency.
  10. 10. Layer 7 load balancing ❖ Hardware based Lb : F5, Fortinet. ❖ Protocol rigidness ❖ No well known Prog. which runs in Kernel Space. ❖ Software Based : Nginx and HaProxy are popular ones. ❖ Benchmarking Issues with Nginx as L7 ❖ Even more CPU Constraint than L4 : 18-20K TPS in same Env
  11. 11. Not covering these for Haproxy ❖ Security Aspects : IPTables, WAF, Selinux ❖ Bare Metal Machines Detailed Specs and Part Numbers ❖ Decision on choice of Machine. ❖ Networking Details ❖ NIC Bonding Specs ❖ Benchmark Tools Detailing : GOR Detailing cue 15
  12. 12. Routing L3/4 ❖ What is routing ❖ Routing scales , less than half resources are required than proxying.
  13. 13. Types of routing ❖ Natting : Works like proxy ❖ Direct Route : Spoof MAC address and send it back. ❖ IP Tunneling : Most Scalable, works on IPIP Tunnel ( across different DCs )
  14. 14. Routers ❖ Hardware routers : Not designed to be horizontally scalable ❖ No Well-Known Horizontally scalable Hw Routers. ❖ We needed a Software Router : LVS/IPVS cue 20
  15. 15. Software Router : LVS ❖ LVS : Linux Virtual server , 20 years old, both Layer 4 and 7 ❖ IPVS : IP Virtual Server, merged in Kernel 2.4 ❖ KTCPVS : App LB , in dev for last 8 years. ❖ Runs in Kernel Space ❖ Supports different distribution methods : RR, Least connection, Weighted LC
  16. 16. LVS Issues ❖ CPU Affinity of Interrupts ❖ RP Filter Bypass ❖ Manageability and Monitoring ❖ HA ❖ IP Tunnel Extensibility
  17. 17. LVS : CPU Affinity ❖ CPU Affinity of Interrupts ❖ Kernel tries to load balance IRQ ( Interrupt Request Line ) across cores. ❖ irqbalance service is responsible. ❖ cat /proc/interrupts will help see which core will max out. ❖ Balance (1) : echo fff > /sys/class/net/eth0/queues/rx-0/rps_cpus ❖ Balance (2) : echo 'fff' > /proc/irq/14/smp_affinity ❖ Balance (3) : echo '0-3' > /proc/irq/28/smp_affinity_list
  18. 18. LVS : RP Filter ❖ RP Filter : To Avoid Spoofing and DDOS ❖ Kernel checks whether the source of the received packet is reachable through the route it came in. ❖ To Disable : net.ipv4.conf.tun.rp_filter = 0 in /etc/sysctl.conf ( and sysctl -p )
  19. 19. LVS :Monitoring & management❖ Managed by System Calls , No config ( use Consul Template ) ❖ Logging : No Logs in user Space, Kernel messages for Errors ❖ Monitoring : Telegraf plugin available ( internals : ipvsadm —list —numeric /—connection /— stats /—rate )
  20. 20. LVS : HA ❖ KeepAlive(d) + VIP ❖ Connection Sync Service ❖ ipvsadm —start/stop- daemon=master/backu p --mcast-interface=<> - -syncid <>
  21. 21. ❖ KeepAlive(d) for own Health Check ❖ Consul Template for Real Server Healtch Check LVS : HealthCheck cue 30
  22. 22. LVS IPIP Debugging ❖ IPIP Tunnel and VIP extension to multiple machines :Painful ❖ IPIP Tunnel Issues and recovery across DC ❖ Setup Probes and Packet Capture
  23. 23. Final Load Test
  24. 24. Final Arch cue 35
  25. 25. Willy Tarreau : Haproxy ❖ Creator of Haproxy ❖ wtarreau.blogspot.com/2006/11/making-applications- scalable-with-load.html ❖ The PPT structure is based on the article.
  26. 26. Shrey Agarwal in.linkedin.com/in/shreyagarwal ❖ wtarreau.blogspot.com/2006/11/making-applications-scalable-with-load.html ❖ opensourceforu.com/2009/05/balancing-traffic-across-data-centres-using-lvs/ ❖ www.austintek.com/LVS/LVS-HOWTO/HOWTO/LVS-HOWTO.LVS-Tun.html ❖ linux.die.net/man/8/ipvsadm ❖ serverfault.com/questions/723786/udp-packets-seen-on-interface-level-but-not-delivered-to-application-on-redhat ❖ serverfault.com/questions/163244/linux-kernel-not-passing-through-multicast-udp-packets References

×