SlideShare a Scribd company logo
In-Depth Study to scale @ 80K TPS
Load Balancing
Starting
❖ About Me : Engineering @ Paytm . Working on this
problem for 2 months
❖ Problem : Identifying Entry Solution for 80K TPS, 20K
active transacting connections , while keeping latency loss
< 2 ms
❖ Outline : Evaluation and Perf test of all sorts of LB,
Routers and classify them
❖ Not Covering : After Every solution, things which are not
covered
Evaluation criteria
▸High Availability ( HA ) : Unaffected service during any
predefined number of simultaneous failures
▸Balancing strategies : Round robin, least connection,
weighted .
▸Health Checks
▸Extensibility : C/Lua Lib support
▸Monitoring and Manageability
▸Perf
Categories of LB
❖ DNS Based
❖ Software & Hardware Based
❖ Layer 3/4 Proxying
❖ Layer 7 Proxying
❖ Routing at L4
cue 5
DNS Based
❖ Multiple IPs : Round
Robin
❖ No Concept of HA,
Monitoring, health
checks
❖ Health Checks, Routing
policies are available via
custom solutions
Layer 3/4 Load Balancing
❖ Hardware Based LBs mostly.
❖ No well known Prog. which runs in Kernel Space.
❖ Software Based User Space Proxy based LBs examples
are Haproxy and Nginx
Haproxy Monitoring
❖ Socket Based Stats are available with ~60 CSV
❖ Web Interface
Benchmarking Env
cue 10
Issues with Haproxy L4
❖ Scale Constraint
❖ Only CPU. Cores 100% with Load(1 min) as 64
❖ Benchmark
❖ 20K TPS , keep-alive off and 100ms backend latency.
Layer 7 load balancing
❖ Hardware based Lb : F5, Fortinet.
❖ Protocol rigidness
❖ No well known Prog. which runs in Kernel Space.
❖ Software Based : Nginx and HaProxy are popular ones.
❖ Benchmarking Issues with Nginx as L7
❖ Even more CPU Constraint than L4 : 18-20K TPS in
same Env
Not covering these for
Haproxy
❖ Security Aspects : IPTables, WAF, Selinux
❖ Bare Metal Machines Detailed Specs and Part Numbers
❖ Decision on choice of Machine.
❖ Networking Details
❖ NIC Bonding Specs
❖ Benchmark Tools Detailing : GOR Detailing
cue 15
Routing L3/4
❖ What is routing
❖ Routing scales , less than half resources are required than
proxying.
Types of routing
❖ Natting : Works like proxy
❖ Direct Route : Spoof MAC address and send it back.
❖ IP Tunneling : Most Scalable, works on IPIP Tunnel ( across different DCs
)
Routers
❖ Hardware routers : Not designed to be horizontally
scalable
❖ No Well-Known Horizontally scalable Hw Routers.
❖ We needed a Software Router : LVS/IPVS
cue 20
Software Router : LVS
❖ LVS : Linux Virtual server , 20 years old,
both Layer 4 and 7
❖ IPVS : IP Virtual Server, merged in
Kernel 2.4
❖ KTCPVS : App LB , in dev for last 8
years.
❖ Runs in Kernel Space
❖ Supports different distribution methods : RR,
Least connection, Weighted LC
LVS Issues
❖ CPU Affinity of Interrupts
❖ RP Filter Bypass
❖ Manageability and Monitoring
❖ HA
❖ IP Tunnel Extensibility
LVS : CPU Affinity
❖ CPU Affinity of Interrupts
❖ Kernel tries to load balance IRQ ( Interrupt Request Line ) across
cores.
❖ irqbalance service is responsible.
❖ cat /proc/interrupts will help see which core will max out.
❖ Balance (1) : echo fff > /sys/class/net/eth0/queues/rx-0/rps_cpus
❖ Balance (2) : echo 'fff' > /proc/irq/14/smp_affinity
❖ Balance (3) : echo '0-3' > /proc/irq/28/smp_affinity_list
LVS : RP Filter
❖ RP Filter : To Avoid Spoofing and DDOS
❖ Kernel checks whether the source of the received
packet is reachable through the route it came in.
❖ To Disable : net.ipv4.conf.tun.rp_filter = 0 in
/etc/sysctl.conf ( and sysctl -p )
LVS :Monitoring &
management❖ Managed by System Calls , No config ( use Consul Template )
❖ Logging : No Logs in user Space, Kernel messages for Errors
❖ Monitoring : Telegraf plugin available ( internals : ipvsadm —list —numeric /—connection /—
stats /—rate )
LVS : HA
❖ KeepAlive(d)
+ VIP
❖ Connection
Sync Service
❖ ipvsadm —start/stop-
daemon=master/backu
p --mcast-interface=<> -
-syncid <>
❖ KeepAlive(d) for own Health Check
❖ Consul Template for Real Server Healtch Check
LVS : HealthCheck
cue 30
LVS IPIP Debugging
❖ IPIP Tunnel and VIP extension to multiple machines :Painful
❖ IPIP Tunnel Issues and recovery across DC
❖ Setup Probes and Packet Capture
Final Load Test
Final Arch
cue 35
Willy Tarreau : Haproxy
❖ Creator of Haproxy
❖ wtarreau.blogspot.com/2006/11/making-applications-
scalable-with-load.html
❖ The PPT structure is based on the article.
Shrey Agarwal
in.linkedin.com/in/shreyagarwal
❖ wtarreau.blogspot.com/2006/11/making-applications-scalable-with-load.html
❖ opensourceforu.com/2009/05/balancing-traffic-across-data-centres-using-lvs/
❖ www.austintek.com/LVS/LVS-HOWTO/HOWTO/LVS-HOWTO.LVS-Tun.html
❖ linux.die.net/man/8/ipvsadm
❖ serverfault.com/questions/723786/udp-packets-seen-on-interface-level-but-not-delivered-to-application-on-redhat
❖ serverfault.com/questions/163244/linux-kernel-not-passing-through-multicast-udp-packets
References

More Related Content

What's hot

ACIT Mumbai - OSI Model
ACIT Mumbai - OSI ModelACIT Mumbai - OSI Model
ACIT Mumbai - OSI Model
Sleek International
 
Sanitizing PCAPs
Sanitizing PCAPsSanitizing PCAPs
Sanitizing PCAPs
Jasper Bongertz
 
12 ethernet-wifi
12 ethernet-wifi12 ethernet-wifi
12 ethernet-wifi
Olivier Bonaventure
 
Part 9 : Congestion control and IPv6
Part 9 : Congestion control and IPv6Part 9 : Congestion control and IPv6
Part 9 : Congestion control and IPv6
Olivier Bonaventure
 
Ports & sockets
Ports  & sockets Ports  & sockets
Ports & sockets
myrajendra
 
The Next Generation Firewall for Red Hat Enterprise Linux 7 RC
The Next Generation Firewall for Red Hat Enterprise Linux 7 RCThe Next Generation Firewall for Red Hat Enterprise Linux 7 RC
The Next Generation Firewall for Red Hat Enterprise Linux 7 RC
Thomas Graf
 
Ports and protocols
Ports and protocolsPorts and protocols
Ports and protocols
siva rama
 
Efficient Topology Discovery in Software Defined Networks
Efficient Topology Discovery in Software Defined NetworksEfficient Topology Discovery in Software Defined Networks
Efficient Topology Discovery in Software Defined Networks
Farzaneh Pakzad
 
Cisco Router Throughput
Cisco Router ThroughputCisco Router Throughput
Cisco Router Throughput
3Anetwork com
 
TCPLS presentation @ietf 109
TCPLS presentation @ietf 109TCPLS presentation @ietf 109
TCPLS presentation @ietf 109
Olivier Bonaventure
 
Part 7 : HTTP/2, UDP and TCP
Part 7 : HTTP/2, UDP and TCPPart 7 : HTTP/2, UDP and TCP
Part 7 : HTTP/2, UDP and TCP
Olivier Bonaventure
 
Introduction to Remote Procedure Call
Introduction to Remote Procedure CallIntroduction to Remote Procedure Call
Introduction to Remote Procedure Call
Abdelrahman Al-Ogail
 
Remote procedure call on client server computing
Remote procedure call on client server computingRemote procedure call on client server computing
Remote procedure call on client server computing
Satya P. Joshi
 
Tcpip 1
Tcpip 1Tcpip 1
Tcpip 1
myrajendra
 
OSN days 2019 - Open Networking and Programmable Switch
OSN days 2019 - Open Networking and Programmable SwitchOSN days 2019 - Open Networking and Programmable Switch
OSN days 2019 - Open Networking and Programmable Switch
Chun Ming Ou
 
DevConf 2014 Kernel Networking Walkthrough
DevConf 2014   Kernel Networking WalkthroughDevConf 2014   Kernel Networking Walkthrough
DevConf 2014 Kernel Networking Walkthrough
Thomas Graf
 
0-RTT TCP converters
0-RTT TCP converters0-RTT TCP converters
0-RTT TCP converters
Olivier Bonaventure
 
Open vSwitch - Stateful Connection Tracking & Stateful NAT
Open vSwitch - Stateful Connection Tracking & Stateful NATOpen vSwitch - Stateful Connection Tracking & Stateful NAT
Open vSwitch - Stateful Connection Tracking & Stateful NAT
Thomas Graf
 
NAT and firewall presentation - how setup a nice firewall
NAT and firewall presentation - how setup a nice firewallNAT and firewall presentation - how setup a nice firewall
NAT and firewall presentation - how setup a nice firewall
Cassiano Campes
 
Innovation is back in the transport and network layers
Innovation is back in the transport and network layersInnovation is back in the transport and network layers
Innovation is back in the transport and network layers
Olivier Bonaventure
 

What's hot (20)

ACIT Mumbai - OSI Model
ACIT Mumbai - OSI ModelACIT Mumbai - OSI Model
ACIT Mumbai - OSI Model
 
Sanitizing PCAPs
Sanitizing PCAPsSanitizing PCAPs
Sanitizing PCAPs
 
12 ethernet-wifi
12 ethernet-wifi12 ethernet-wifi
12 ethernet-wifi
 
Part 9 : Congestion control and IPv6
Part 9 : Congestion control and IPv6Part 9 : Congestion control and IPv6
Part 9 : Congestion control and IPv6
 
Ports & sockets
Ports  & sockets Ports  & sockets
Ports & sockets
 
The Next Generation Firewall for Red Hat Enterprise Linux 7 RC
The Next Generation Firewall for Red Hat Enterprise Linux 7 RCThe Next Generation Firewall for Red Hat Enterprise Linux 7 RC
The Next Generation Firewall for Red Hat Enterprise Linux 7 RC
 
Ports and protocols
Ports and protocolsPorts and protocols
Ports and protocols
 
Efficient Topology Discovery in Software Defined Networks
Efficient Topology Discovery in Software Defined NetworksEfficient Topology Discovery in Software Defined Networks
Efficient Topology Discovery in Software Defined Networks
 
Cisco Router Throughput
Cisco Router ThroughputCisco Router Throughput
Cisco Router Throughput
 
TCPLS presentation @ietf 109
TCPLS presentation @ietf 109TCPLS presentation @ietf 109
TCPLS presentation @ietf 109
 
Part 7 : HTTP/2, UDP and TCP
Part 7 : HTTP/2, UDP and TCPPart 7 : HTTP/2, UDP and TCP
Part 7 : HTTP/2, UDP and TCP
 
Introduction to Remote Procedure Call
Introduction to Remote Procedure CallIntroduction to Remote Procedure Call
Introduction to Remote Procedure Call
 
Remote procedure call on client server computing
Remote procedure call on client server computingRemote procedure call on client server computing
Remote procedure call on client server computing
 
Tcpip 1
Tcpip 1Tcpip 1
Tcpip 1
 
OSN days 2019 - Open Networking and Programmable Switch
OSN days 2019 - Open Networking and Programmable SwitchOSN days 2019 - Open Networking and Programmable Switch
OSN days 2019 - Open Networking and Programmable Switch
 
DevConf 2014 Kernel Networking Walkthrough
DevConf 2014   Kernel Networking WalkthroughDevConf 2014   Kernel Networking Walkthrough
DevConf 2014 Kernel Networking Walkthrough
 
0-RTT TCP converters
0-RTT TCP converters0-RTT TCP converters
0-RTT TCP converters
 
Open vSwitch - Stateful Connection Tracking & Stateful NAT
Open vSwitch - Stateful Connection Tracking & Stateful NATOpen vSwitch - Stateful Connection Tracking & Stateful NAT
Open vSwitch - Stateful Connection Tracking & Stateful NAT
 
NAT and firewall presentation - how setup a nice firewall
NAT and firewall presentation - how setup a nice firewallNAT and firewall presentation - how setup a nice firewall
NAT and firewall presentation - how setup a nice firewall
 
Innovation is back in the transport and network layers
Innovation is back in the transport and network layersInnovation is back in the transport and network layers
Innovation is back in the transport and network layers
 

Similar to Loadbalancing In-depth study for scale @ 80K TPS

Loadbalancing In-depth study for scale @ 80K TPS
Loadbalancing In-depth study for scale @ 80K TPS Loadbalancing In-depth study for scale @ 80K TPS
Loadbalancing In-depth study for scale @ 80K TPS
Shrey Agarwal
 
Experiences with Microservices at Tuenti
Experiences with Microservices at TuentiExperiences with Microservices at Tuenti
Experiences with Microservices at Tuenti
Andrés Viedma Peláez
 
Spy hard, challenges of 100G deep packet inspection on x86 platform
Spy hard, challenges of 100G deep packet inspection on x86 platformSpy hard, challenges of 100G deep packet inspection on x86 platform
Spy hard, challenges of 100G deep packet inspection on x86 platform
Redge Technologies
 
FlowER Erlang Openflow Controller
FlowER Erlang Openflow ControllerFlowER Erlang Openflow Controller
FlowER Erlang Openflow Controller
Holger Winkelmann
 
Comparing ZooKeeper and Consul
Comparing ZooKeeper and ConsulComparing ZooKeeper and Consul
Comparing ZooKeeper and Consul
Ivan Glushkov
 
Brief LoRaWAN Overview
Brief LoRaWAN OverviewBrief LoRaWAN Overview
Brief LoRaWAN Overview
Alper Yegin
 
Cilium - Fast IPv6 Container Networking with BPF and XDP
Cilium - Fast IPv6 Container Networking with BPF and XDPCilium - Fast IPv6 Container Networking with BPF and XDP
Cilium - Fast IPv6 Container Networking with BPF and XDP
Thomas Graf
 
L4-L7 Application Services with Avi Networks
L4-L7 Application Services with Avi NetworksL4-L7 Application Services with Avi Networks
L4-L7 Application Services with Avi Networks
Avi Networks
 
PLNOG 18 - Paweł Małachowski - Spy hard czyli regexpem po pakietach
PLNOG 18 - Paweł Małachowski - Spy hard czyli regexpem po pakietachPLNOG 18 - Paweł Małachowski - Spy hard czyli regexpem po pakietach
PLNOG 18 - Paweł Małachowski - Spy hard czyli regexpem po pakietach
PROIDEA
 
20160927-tierney-improving-performance-40G-100G-data-transfer-nodes.pdf
20160927-tierney-improving-performance-40G-100G-data-transfer-nodes.pdf20160927-tierney-improving-performance-40G-100G-data-transfer-nodes.pdf
20160927-tierney-improving-performance-40G-100G-data-transfer-nodes.pdf
JunZhao68
 
Openlab.2014 02-13.major.vi sion
Openlab.2014 02-13.major.vi sionOpenlab.2014 02-13.major.vi sion
Openlab.2014 02-13.major.vi sion
Ccie Light
 
Software defined network and Virtualization
Software defined network and VirtualizationSoftware defined network and Virtualization
Software defined network and Virtualization
idrajeev
 
FD.io - The Universal Dataplane
FD.io - The Universal DataplaneFD.io - The Universal Dataplane
FD.io - The Universal Dataplane
Open Networking Summit
 
Snabb Switch: Riding the HPC wave to simpler, better network appliances (FOSD...
Snabb Switch: Riding the HPC wave to simpler, better network appliances (FOSD...Snabb Switch: Riding the HPC wave to simpler, better network appliances (FOSD...
Snabb Switch: Riding the HPC wave to simpler, better network appliances (FOSD...
Igalia
 
OpenFlow Tutorial
OpenFlow TutorialOpenFlow Tutorial
OpenFlow Tutorial
Ja-seop Kwak
 
Open vSwitch Introduction
Open vSwitch IntroductionOpen vSwitch Introduction
Open vSwitch Introduction
HungWei Chiu
 
Openflow overview
Openflow overviewOpenflow overview
Openflow overview
openflowhub
 
Evergreen Sysadmin Survival Skills
Evergreen Sysadmin Survival SkillsEvergreen Sysadmin Survival Skills
Evergreen Sysadmin Survival Skills
Evergreen ILS
 
Network State Awareness & Troubleshooting
Network State Awareness & TroubleshootingNetwork State Awareness & Troubleshooting
Network State Awareness & Troubleshooting
APNIC
 
Tech Talk by Ben Pfaff: Open vSwitch - Part 2
Tech Talk by Ben Pfaff: Open vSwitch - Part 2Tech Talk by Ben Pfaff: Open vSwitch - Part 2
Tech Talk by Ben Pfaff: Open vSwitch - Part 2
nvirters
 

Similar to Loadbalancing In-depth study for scale @ 80K TPS (20)

Loadbalancing In-depth study for scale @ 80K TPS
Loadbalancing In-depth study for scale @ 80K TPS Loadbalancing In-depth study for scale @ 80K TPS
Loadbalancing In-depth study for scale @ 80K TPS
 
Experiences with Microservices at Tuenti
Experiences with Microservices at TuentiExperiences with Microservices at Tuenti
Experiences with Microservices at Tuenti
 
Spy hard, challenges of 100G deep packet inspection on x86 platform
Spy hard, challenges of 100G deep packet inspection on x86 platformSpy hard, challenges of 100G deep packet inspection on x86 platform
Spy hard, challenges of 100G deep packet inspection on x86 platform
 
FlowER Erlang Openflow Controller
FlowER Erlang Openflow ControllerFlowER Erlang Openflow Controller
FlowER Erlang Openflow Controller
 
Comparing ZooKeeper and Consul
Comparing ZooKeeper and ConsulComparing ZooKeeper and Consul
Comparing ZooKeeper and Consul
 
Brief LoRaWAN Overview
Brief LoRaWAN OverviewBrief LoRaWAN Overview
Brief LoRaWAN Overview
 
Cilium - Fast IPv6 Container Networking with BPF and XDP
Cilium - Fast IPv6 Container Networking with BPF and XDPCilium - Fast IPv6 Container Networking with BPF and XDP
Cilium - Fast IPv6 Container Networking with BPF and XDP
 
L4-L7 Application Services with Avi Networks
L4-L7 Application Services with Avi NetworksL4-L7 Application Services with Avi Networks
L4-L7 Application Services with Avi Networks
 
PLNOG 18 - Paweł Małachowski - Spy hard czyli regexpem po pakietach
PLNOG 18 - Paweł Małachowski - Spy hard czyli regexpem po pakietachPLNOG 18 - Paweł Małachowski - Spy hard czyli regexpem po pakietach
PLNOG 18 - Paweł Małachowski - Spy hard czyli regexpem po pakietach
 
20160927-tierney-improving-performance-40G-100G-data-transfer-nodes.pdf
20160927-tierney-improving-performance-40G-100G-data-transfer-nodes.pdf20160927-tierney-improving-performance-40G-100G-data-transfer-nodes.pdf
20160927-tierney-improving-performance-40G-100G-data-transfer-nodes.pdf
 
Openlab.2014 02-13.major.vi sion
Openlab.2014 02-13.major.vi sionOpenlab.2014 02-13.major.vi sion
Openlab.2014 02-13.major.vi sion
 
Software defined network and Virtualization
Software defined network and VirtualizationSoftware defined network and Virtualization
Software defined network and Virtualization
 
FD.io - The Universal Dataplane
FD.io - The Universal DataplaneFD.io - The Universal Dataplane
FD.io - The Universal Dataplane
 
Snabb Switch: Riding the HPC wave to simpler, better network appliances (FOSD...
Snabb Switch: Riding the HPC wave to simpler, better network appliances (FOSD...Snabb Switch: Riding the HPC wave to simpler, better network appliances (FOSD...
Snabb Switch: Riding the HPC wave to simpler, better network appliances (FOSD...
 
OpenFlow Tutorial
OpenFlow TutorialOpenFlow Tutorial
OpenFlow Tutorial
 
Open vSwitch Introduction
Open vSwitch IntroductionOpen vSwitch Introduction
Open vSwitch Introduction
 
Openflow overview
Openflow overviewOpenflow overview
Openflow overview
 
Evergreen Sysadmin Survival Skills
Evergreen Sysadmin Survival SkillsEvergreen Sysadmin Survival Skills
Evergreen Sysadmin Survival Skills
 
Network State Awareness & Troubleshooting
Network State Awareness & TroubleshootingNetwork State Awareness & Troubleshooting
Network State Awareness & Troubleshooting
 
Tech Talk by Ben Pfaff: Open vSwitch - Part 2
Tech Talk by Ben Pfaff: Open vSwitch - Part 2Tech Talk by Ben Pfaff: Open vSwitch - Part 2
Tech Talk by Ben Pfaff: Open vSwitch - Part 2
 

Recently uploaded

みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
名前 です男
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
shyamraj55
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
Zilliz
 
Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
Mariano Tinti
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
Neo4j
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
Pixlogix Infotech
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
kumardaparthi1024
 
Infrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI modelsInfrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI models
Zilliz
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
DianaGray10
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
danishmna97
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
Claudio Di Ciccio
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Malak Abu Hammad
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
Matthew Sinclair
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Safe Software
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
Neo4j
 

Recently uploaded (20)

みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
 
Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
 
Infrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI modelsInfrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI models
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
 

Loadbalancing In-depth study for scale @ 80K TPS

  • 1. In-Depth Study to scale @ 80K TPS Load Balancing
  • 2. Starting ❖ About Me : Engineering @ Paytm . Working on this problem for 2 months ❖ Problem : Identifying Entry Solution for 80K TPS, 20K active transacting connections , while keeping latency loss < 2 ms ❖ Outline : Evaluation and Perf test of all sorts of LB, Routers and classify them ❖ Not Covering : After Every solution, things which are not covered
  • 3. Evaluation criteria ▸High Availability ( HA ) : Unaffected service during any predefined number of simultaneous failures ▸Balancing strategies : Round robin, least connection, weighted . ▸Health Checks ▸Extensibility : C/Lua Lib support ▸Monitoring and Manageability ▸Perf
  • 4. Categories of LB ❖ DNS Based ❖ Software & Hardware Based ❖ Layer 3/4 Proxying ❖ Layer 7 Proxying ❖ Routing at L4 cue 5
  • 5. DNS Based ❖ Multiple IPs : Round Robin ❖ No Concept of HA, Monitoring, health checks ❖ Health Checks, Routing policies are available via custom solutions
  • 6. Layer 3/4 Load Balancing ❖ Hardware Based LBs mostly. ❖ No well known Prog. which runs in Kernel Space. ❖ Software Based User Space Proxy based LBs examples are Haproxy and Nginx
  • 7. Haproxy Monitoring ❖ Socket Based Stats are available with ~60 CSV ❖ Web Interface
  • 9. Issues with Haproxy L4 ❖ Scale Constraint ❖ Only CPU. Cores 100% with Load(1 min) as 64 ❖ Benchmark ❖ 20K TPS , keep-alive off and 100ms backend latency.
  • 10. Layer 7 load balancing ❖ Hardware based Lb : F5, Fortinet. ❖ Protocol rigidness ❖ No well known Prog. which runs in Kernel Space. ❖ Software Based : Nginx and HaProxy are popular ones. ❖ Benchmarking Issues with Nginx as L7 ❖ Even more CPU Constraint than L4 : 18-20K TPS in same Env
  • 11. Not covering these for Haproxy ❖ Security Aspects : IPTables, WAF, Selinux ❖ Bare Metal Machines Detailed Specs and Part Numbers ❖ Decision on choice of Machine. ❖ Networking Details ❖ NIC Bonding Specs ❖ Benchmark Tools Detailing : GOR Detailing cue 15
  • 12. Routing L3/4 ❖ What is routing ❖ Routing scales , less than half resources are required than proxying.
  • 13. Types of routing ❖ Natting : Works like proxy ❖ Direct Route : Spoof MAC address and send it back. ❖ IP Tunneling : Most Scalable, works on IPIP Tunnel ( across different DCs )
  • 14. Routers ❖ Hardware routers : Not designed to be horizontally scalable ❖ No Well-Known Horizontally scalable Hw Routers. ❖ We needed a Software Router : LVS/IPVS cue 20
  • 15. Software Router : LVS ❖ LVS : Linux Virtual server , 20 years old, both Layer 4 and 7 ❖ IPVS : IP Virtual Server, merged in Kernel 2.4 ❖ KTCPVS : App LB , in dev for last 8 years. ❖ Runs in Kernel Space ❖ Supports different distribution methods : RR, Least connection, Weighted LC
  • 16. LVS Issues ❖ CPU Affinity of Interrupts ❖ RP Filter Bypass ❖ Manageability and Monitoring ❖ HA ❖ IP Tunnel Extensibility
  • 17. LVS : CPU Affinity ❖ CPU Affinity of Interrupts ❖ Kernel tries to load balance IRQ ( Interrupt Request Line ) across cores. ❖ irqbalance service is responsible. ❖ cat /proc/interrupts will help see which core will max out. ❖ Balance (1) : echo fff > /sys/class/net/eth0/queues/rx-0/rps_cpus ❖ Balance (2) : echo 'fff' > /proc/irq/14/smp_affinity ❖ Balance (3) : echo '0-3' > /proc/irq/28/smp_affinity_list
  • 18. LVS : RP Filter ❖ RP Filter : To Avoid Spoofing and DDOS ❖ Kernel checks whether the source of the received packet is reachable through the route it came in. ❖ To Disable : net.ipv4.conf.tun.rp_filter = 0 in /etc/sysctl.conf ( and sysctl -p )
  • 19. LVS :Monitoring & management❖ Managed by System Calls , No config ( use Consul Template ) ❖ Logging : No Logs in user Space, Kernel messages for Errors ❖ Monitoring : Telegraf plugin available ( internals : ipvsadm —list —numeric /—connection /— stats /—rate )
  • 20. LVS : HA ❖ KeepAlive(d) + VIP ❖ Connection Sync Service ❖ ipvsadm —start/stop- daemon=master/backu p --mcast-interface=<> - -syncid <>
  • 21. ❖ KeepAlive(d) for own Health Check ❖ Consul Template for Real Server Healtch Check LVS : HealthCheck cue 30
  • 22. LVS IPIP Debugging ❖ IPIP Tunnel and VIP extension to multiple machines :Painful ❖ IPIP Tunnel Issues and recovery across DC ❖ Setup Probes and Packet Capture
  • 25. Willy Tarreau : Haproxy ❖ Creator of Haproxy ❖ wtarreau.blogspot.com/2006/11/making-applications- scalable-with-load.html ❖ The PPT structure is based on the article.
  • 26. Shrey Agarwal in.linkedin.com/in/shreyagarwal ❖ wtarreau.blogspot.com/2006/11/making-applications-scalable-with-load.html ❖ opensourceforu.com/2009/05/balancing-traffic-across-data-centres-using-lvs/ ❖ www.austintek.com/LVS/LVS-HOWTO/HOWTO/LVS-HOWTO.LVS-Tun.html ❖ linux.die.net/man/8/ipvsadm ❖ serverfault.com/questions/723786/udp-packets-seen-on-interface-level-but-not-delivered-to-application-on-redhat ❖ serverfault.com/questions/163244/linux-kernel-not-passing-through-multicast-udp-packets References