SlideShare a Scribd company logo
1 of 23
Scaling Kubernetes to Support 50,000 Services
Haibin Michael Xie, Senior Staff Engineer/Architect, Huawei
Quinton Hoole, Technical Vice President, Huawei
Agenda
• Challenges while scaling services
• Solutions and prototypes
• Performance data
• Q&A
Master
What are the Challenges while Scaling Services
• Control plane (Master, kubelet,
kube-proxy) API Server
Controller
Manager
Scheduler
Node
KubeProxy
…
ETCD
Kubelet
Load Balancer
• Deploy services and pods
• Propagate endpoints
• Add/remove services in load balancer
• Network access to services
• Data plane (load balancer)
Pod Pod Pod
Node
KubeProxy Kubelet
Load Balancer
Pod Pod Pod
API Server
Control Plane
ETCD
services
Controller
Manager
pods
endpoints
Endpoints
Controller
Node
KubeProxy
Node
KubeProxy
Node
KubeProxy
… …
N nodes per cluster
M pods per second
QPS: N*M endpoints per second
Service deployed
Pod deployed and
scheduled
Endpoints
/registry/services/endpoints/default/my-service
/registry/services/specs/default/my-service
API Server
Control Plane
ETCD
services
Controller
Manager
pods
endpoints
Endpoints
Controller
Node
KubeProxy
Node
KubeProxy
Node
KubeProxy
… …
N nodes per cluster
M pods per second
QPS: N*M endpoints per second
Load: N*M*(M+1)/2 addresses per second
Control Plane Solution Alternatives
1. Partition endpoints object into multiple objects
• Pros: reduce Endpoints object size
• Cons: increase # of objects and requests
2. Central load balancer
• Pros: reduce connections and requests to API server
• Cons: one more hop in service routing, require strong HA, limited LB scalability
3. Batch creating/updating endpoints
• Timer based, no change to data structure in ETCD
• Pros: reduce QPS
• Cons: E2E latency is increased by Batch interval
API Server
Control Plane Solution
ETCD
services
Controller
Manager
pods
endpoints
Endpoints
Controller
Node
KubeProxy
Node
KubeProxy
Node
KubeProxy
… …
QPS: N*M per second
Load: N*M*(M+1)/2 addresses per second
Timer and batch QPS: N*M per second
Load: N*M*(M+1)/2 addresses per second
QPS: N per second
Load: N*M addresses per second
N nodes per cluster
M pods per second
Batch Processing Request Reduction
One batch per 0.5 second.
➢ QPS:reduced 98%
Pods per
Service
Number of
Service
EndPoints Controller # of Requests
Before After Reduction
10
100 551 10 98.2%
150 785 14 98.2%
200 1105 17 98.5%
Test setup:
1 Master,4 slaves
16 core 2.60GHz, 48GB RAM
Batch Processing E2E Latency Reduction
Latency: reduced 60+% Pods per
Service
Number of
Service
E2E Latency (Second)
Before After Reduction
10
100 8.5 3.5 59.1%
150 13.5 5.3 60.9%
200 22.8 7.8 65.8%
Data Plane
• What is IPTables?
• iptables is a user-space application that allows configuring Linux kernel
firewall (implemented on top of Netfilter) by configuring chains and
rules.
• What is Netfilter? A framework provided by the Linux kernel that allows
customization of networking-related operations, such as packet
filtering, NAT, port translation etc.
• Issues with IPTables as a load balancer
• Latency to access service (routing latency)
• Latency to add/remove rules
IPTables Example
# Iptables –t nat –L –n
Chain PREROUTING (policy ACCEPT)
target prot opt source destination
KUBE-SERVICES all -- anywhere anywhere /* kubernetes service portals */ ← 1
DOCKER all -- anywhere anywhere ADDRTYPE match dst-type LOCAL
Chain KUBE-SEP-G3MLSGWVLUPEIMXS (1 references) ← 4
target prot opt source destination
MARK all -- 172.16.16.2 anywhere /* default/webpod-service: */ MARK set 0x4d415351
DNAT tcp -- anywhere anywhere /* default/webpod-service: */ tcp to:172.16.16.2:80
Chain KUBE-SEP-OUBP2X5UG3G4CYYB (1 references)
target prot opt source destination
MARK all -- 192.168.190.128 anywhere /* default/kubernetes: */ MARK set 0x4d415351
DNAT tcp -- anywhere anywhere /* default/kubernetes: */ tcp to:192.168.190.128:6443
Chain KUBE-SEP-PXEMGP3B44XONJEO (1 references) ← 4
target prot opt source destination
MARK all -- 172.16.91.2 anywhere /* default/webpod-service: */ MARK set 0x4d415351
DNAT tcp -- anywhere anywhere /* default/webpod-service: */ tcp to:172.16.91.2:80
Chain KUBE-SERVICES (2 references) ← 2
target prot opt source destination
KUBE-SVC-N4RX4VPNP4ATLCGG tcp -- anywhere 192.168.3.237 /* default/webpod-service: cluster IP */ tcp dpt:http
KUBE-SVC-6N4SJQIF3IX3FORG tcp -- anywhere 192.168.3.1 /* default/kubernetes: cluster IP */ tcp dpt:https
KUBE-NODEPORTS all -- anywhere anywhere /* kubernetes service nodeports; NOTE: this must be the last rule in this chain */ ADDRTYPE match dst-type
LOCAL
Chain KUBE-SVC-6N4SJQIF3IX3FORG (1 references)
target prot opt source destination
KUBE-SEP-OUBP2X5UG3G4CYYB all -- anywhere anywhere /* default/kubernetes: */
Chain KUBE-SVC-N4RX4VPNP4ATLCGG (1 references) ← 3
target prot opt source destination
KUBE-SEP-G3MLSGWVLUPEIMXS all -- anywhere anywhere /* default/webpod-service: */ statistic mode random probability 0.50000000000
KUBE-SEP-PXEMGP3B44XONJEO all -- anywhere anywhere /* default/webpod-service: */
IPTables Service Routing Performance
1 Service (µs) 1000 Services (µs) 10000 Services (µs) 50000 Services (µs)
First Service 575 614 1023 1821
Middle Service 575 602 1048 4174
Last Service 575 631 1050 7077
In this test, there is one entry per service in KUBE-SERVICES chain.
Where is latency generated?
• Long list of rules in a chain
• Enumerate through the list to find a service and pod
Latency to Add IPTables Rules
• Where is the latency generated?
• not incremental
• copy all rules
• make changes
• save all rules back
• IPTables locked during rule update
• Time spent to add one rule when there are 5k services (40k rules): 11
minutes
• 20k services (160k rules): 5 hours
Data Plane Solution
• Re-structure IPTables using a search tree
○ Performance improvement
• Replace IPTables with “IP Virtual Server” (IPVS)
○ Dramatic performance
○ Much smarter load balancing
■ weighted
■ retries
■ … etc
VIP
Restructure IPTables as a Search Tree
10.10.0.0/16
10.10.1.0/24
VIP: 10.10.1.5 VIP:10.10.1.100
10.10.100.0/24
VIP:10.10.100.1
Service VIP range: 10.10.0.0/16
CIDR list = [16, 24], defines tree layout
Create 3 services: 10.10.1.5, 10.10.1.100, 10.10.100.1
Search tree based service routing time complexity: , m is tree depth
Original service routing time complexity: O(n)
What is IP Virtual Server (IPVS)?
• Transport layer load balancer which directs requests for TCP and UDP
based services to real servers.
• Part of the Linux Virtual Server (LVS) project.
• As with IPTables, IPVS is built on top of Netfilter.
• Supports 3 load balancing modes:
• NAT
• DR (L2 load balancing via MAC rewriting)
• IP Tunneling
IPVS vs. IPTables
IPTables:
• Manipulates tables provided by the linux firewall
• IPTables is more flexible and can manipulate packets at different stages: Pre-
routing, post-routing, forward, input, output.
• IPTables has more operations: SNAT, DNAT, reject packets, port translation etc.
Why use IPVS?
• Better performance (Hashing vs. Chains)
• More load balancing algorithms
• Round robin, source/destination hashing.
• Based on least load, least connection or locality, can assign weights to servers.
• Supports server health checks and connection retries
• Supports sticky sessions
IPVS Load Balancing Mode in Kubernetes
• Not publically released yet
• No Kubernetes behavior changes, fully compatible:
• external IP, nodePort etc
• Added kube-proxy startup parameter mode=IPVS, in addition to
original modes: mode=userspace and mode=iptables
• Original kube-proxy lines of code: 11800
• IPVS mode adds 680 lines of code
• dependent on seasaw library
IPVS vs. IPTables Latency to Add Rules
# of Services 1 5,000 20,000
# of Rules 8 40,000 160,000
IPTables 2 ms 11 min 5 hours
IPVS 2 ms 2 ms 2 ms
Measured by iptables and ipvsadm, observations:
➢ In IPTables mode, latency to add rule increases significantly when # of service increases
➢ In IPVS mode, latency to add VIP and backend IPs does not increase when # of service increases
IPVS vs. IPTables Network Bandwidth
➢Measured by qperf
➢Bandwidth, QPS, Latency have similar pattern
➢Env: 1 master, 4 slaves, 8 pods, all services use these 8 pods
➢Each service exposes 4 ports (4 entries in KUBE-SERVICES chain)
ith service first first last first last first last first last first last
# of services 1 1000 1000 5000 5000 10000 10000 25000 25000 50000 50000
Bandwidth, IPTables (MB/S) 66.6 64 56 50 38.6 15 6 0 0 0 0
Bandwidth, IPVS (MB/S) 65.3 61.7 55.3 53.5 53.8 43 43.5 30 28.5 24 23.8
Other Perf/Scalability Work We Have Done...
• Scale nodes and pods in single cluster
• Reduce E2E latency of deploying pods/services
• Increase pod deployment throughput
• Improve scheduling performance
• Optimistic concurrent scheduler
• Flow graph based high performance scheduling and rescheduling with the
University of Cambridge (UK).
Thank You
quinton.hoole@huawei.com
haibin.michael.xie@huawei.com

More Related Content

Similar to Scaling Kubernetes to Support 50000 Services.pptx

(NET404) Making Every Packet Count
(NET404) Making Every Packet Count(NET404) Making Every Packet Count
(NET404) Making Every Packet CountAmazon Web Services
 
Load Balancing 101
Load Balancing 101Load Balancing 101
Load Balancing 101HungWei Chiu
 
Evolution of kube-proxy (Brussels, Fosdem 2020)
Evolution of kube-proxy (Brussels, Fosdem 2020)Evolution of kube-proxy (Brussels, Fosdem 2020)
Evolution of kube-proxy (Brussels, Fosdem 2020)Laurent Bernaille
 
Osnug meetup-tungsten fabric - overview.pptx
Osnug meetup-tungsten fabric - overview.pptxOsnug meetup-tungsten fabric - overview.pptx
Osnug meetup-tungsten fabric - overview.pptxM.Qasim Arham
 
AWS re:Invent 2016: Making Every Packet Count (NET404)
AWS re:Invent 2016: Making Every Packet Count (NET404)AWS re:Invent 2016: Making Every Packet Count (NET404)
AWS re:Invent 2016: Making Every Packet Count (NET404)Amazon Web Services
 
KubeCon EU 2016: Creating an Advanced Load Balancing Solution for Kubernetes ...
KubeCon EU 2016: Creating an Advanced Load Balancing Solution for Kubernetes ...KubeCon EU 2016: Creating an Advanced Load Balancing Solution for Kubernetes ...
KubeCon EU 2016: Creating an Advanced Load Balancing Solution for Kubernetes ...KubeAcademy
 
DPDK Summit 2015 - HP - Al Sanders
DPDK Summit 2015 - HP - Al SandersDPDK Summit 2015 - HP - Al Sanders
DPDK Summit 2015 - HP - Al SandersJim St. Leger
 
Open stackaustinmeetupsept21
Open stackaustinmeetupsept21Open stackaustinmeetupsept21
Open stackaustinmeetupsept21Brent Doncaster
 
The Journey to the Kubernetes networking.pdf
The Journey to the Kubernetes networking.pdfThe Journey to the Kubernetes networking.pdf
The Journey to the Kubernetes networking.pdfChenYiHuang5
 
VMworld 2013: Extreme Performance Series: Network Speed Ahead
VMworld 2013: Extreme Performance Series: Network Speed Ahead VMworld 2013: Extreme Performance Series: Network Speed Ahead
VMworld 2013: Extreme Performance Series: Network Speed Ahead VMworld
 
[OpenInfra Days Korea 2018] Day 2 - E6 - OpenInfra monitoring with Prometheus
[OpenInfra Days Korea 2018] Day 2 - E6 - OpenInfra monitoring with Prometheus[OpenInfra Days Korea 2018] Day 2 - E6 - OpenInfra monitoring with Prometheus
[OpenInfra Days Korea 2018] Day 2 - E6 - OpenInfra monitoring with PrometheusOpenStack Korea Community
 
Design and Performance Characteristics of Tap-as-a-Service
Design and Performance Characteristics of Tap-as-a-ServiceDesign and Performance Characteristics of Tap-as-a-Service
Design and Performance Characteristics of Tap-as-a-Servicesoichi shigeta
 
Service Assurance for Virtual Network Functions in Cloud-Native Environments
Service Assurance for Virtual Network Functions in Cloud-Native EnvironmentsService Assurance for Virtual Network Functions in Cloud-Native Environments
Service Assurance for Virtual Network Functions in Cloud-Native EnvironmentsNikos Anastopoulos
 
Kubernetes Internals
Kubernetes InternalsKubernetes Internals
Kubernetes InternalsShimi Bandiel
 
Support of containerized workloads in ONAP
Support of containerized workloads in ONAPSupport of containerized workloads in ONAP
Support of containerized workloads in ONAPVictor Morales
 
Replacing iptables with eBPF in Kubernetes with Cilium
Replacing iptables with eBPF in Kubernetes with CiliumReplacing iptables with eBPF in Kubernetes with Cilium
Replacing iptables with eBPF in Kubernetes with CiliumMichal Rostecki
 
Presentation oracle net services
Presentation    oracle net servicesPresentation    oracle net services
Presentation oracle net servicesxKinAnx
 
ddsf-student-presentation_756205.pptx
ddsf-student-presentation_756205.pptxddsf-student-presentation_756205.pptx
ddsf-student-presentation_756205.pptxssuser498be2
 
Summit 16: How to Compose a New OPNFV Solution Stack?
Summit 16: How to Compose a New OPNFV Solution Stack?Summit 16: How to Compose a New OPNFV Solution Stack?
Summit 16: How to Compose a New OPNFV Solution Stack?OPNFV
 

Similar to Scaling Kubernetes to Support 50000 Services.pptx (20)

(NET404) Making Every Packet Count
(NET404) Making Every Packet Count(NET404) Making Every Packet Count
(NET404) Making Every Packet Count
 
Load Balancing 101
Load Balancing 101Load Balancing 101
Load Balancing 101
 
Evolution of kube-proxy (Brussels, Fosdem 2020)
Evolution of kube-proxy (Brussels, Fosdem 2020)Evolution of kube-proxy (Brussels, Fosdem 2020)
Evolution of kube-proxy (Brussels, Fosdem 2020)
 
Osnug meetup-tungsten fabric - overview.pptx
Osnug meetup-tungsten fabric - overview.pptxOsnug meetup-tungsten fabric - overview.pptx
Osnug meetup-tungsten fabric - overview.pptx
 
AWS re:Invent 2016: Making Every Packet Count (NET404)
AWS re:Invent 2016: Making Every Packet Count (NET404)AWS re:Invent 2016: Making Every Packet Count (NET404)
AWS re:Invent 2016: Making Every Packet Count (NET404)
 
KubeCon EU 2016: Creating an Advanced Load Balancing Solution for Kubernetes ...
KubeCon EU 2016: Creating an Advanced Load Balancing Solution for Kubernetes ...KubeCon EU 2016: Creating an Advanced Load Balancing Solution for Kubernetes ...
KubeCon EU 2016: Creating an Advanced Load Balancing Solution for Kubernetes ...
 
DPDK Summit 2015 - HP - Al Sanders
DPDK Summit 2015 - HP - Al SandersDPDK Summit 2015 - HP - Al Sanders
DPDK Summit 2015 - HP - Al Sanders
 
Open stackaustinmeetupsept21
Open stackaustinmeetupsept21Open stackaustinmeetupsept21
Open stackaustinmeetupsept21
 
The Journey to the Kubernetes networking.pdf
The Journey to the Kubernetes networking.pdfThe Journey to the Kubernetes networking.pdf
The Journey to the Kubernetes networking.pdf
 
VMworld 2013: Extreme Performance Series: Network Speed Ahead
VMworld 2013: Extreme Performance Series: Network Speed Ahead VMworld 2013: Extreme Performance Series: Network Speed Ahead
VMworld 2013: Extreme Performance Series: Network Speed Ahead
 
[OpenInfra Days Korea 2018] Day 2 - E6 - OpenInfra monitoring with Prometheus
[OpenInfra Days Korea 2018] Day 2 - E6 - OpenInfra monitoring with Prometheus[OpenInfra Days Korea 2018] Day 2 - E6 - OpenInfra monitoring with Prometheus
[OpenInfra Days Korea 2018] Day 2 - E6 - OpenInfra monitoring with Prometheus
 
Design and Performance Characteristics of Tap-as-a-Service
Design and Performance Characteristics of Tap-as-a-ServiceDesign and Performance Characteristics of Tap-as-a-Service
Design and Performance Characteristics of Tap-as-a-Service
 
Service Assurance for Virtual Network Functions in Cloud-Native Environments
Service Assurance for Virtual Network Functions in Cloud-Native EnvironmentsService Assurance for Virtual Network Functions in Cloud-Native Environments
Service Assurance for Virtual Network Functions in Cloud-Native Environments
 
Kubernetes Internals
Kubernetes InternalsKubernetes Internals
Kubernetes Internals
 
Support of containerized workloads in ONAP
Support of containerized workloads in ONAPSupport of containerized workloads in ONAP
Support of containerized workloads in ONAP
 
Replacing iptables with eBPF in Kubernetes with Cilium
Replacing iptables with eBPF in Kubernetes with CiliumReplacing iptables with eBPF in Kubernetes with Cilium
Replacing iptables with eBPF in Kubernetes with Cilium
 
Presentation oracle net services
Presentation    oracle net servicesPresentation    oracle net services
Presentation oracle net services
 
Accelerated SDN in Azure
Accelerated SDN in AzureAccelerated SDN in Azure
Accelerated SDN in Azure
 
ddsf-student-presentation_756205.pptx
ddsf-student-presentation_756205.pptxddsf-student-presentation_756205.pptx
ddsf-student-presentation_756205.pptx
 
Summit 16: How to Compose a New OPNFV Solution Stack?
Summit 16: How to Compose a New OPNFV Solution Stack?Summit 16: How to Compose a New OPNFV Solution Stack?
Summit 16: How to Compose a New OPNFV Solution Stack?
 

Recently uploaded

My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 

Recently uploaded (20)

My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 

Scaling Kubernetes to Support 50000 Services.pptx

  • 1. Scaling Kubernetes to Support 50,000 Services Haibin Michael Xie, Senior Staff Engineer/Architect, Huawei Quinton Hoole, Technical Vice President, Huawei
  • 2. Agenda • Challenges while scaling services • Solutions and prototypes • Performance data • Q&A
  • 3. Master What are the Challenges while Scaling Services • Control plane (Master, kubelet, kube-proxy) API Server Controller Manager Scheduler Node KubeProxy … ETCD Kubelet Load Balancer • Deploy services and pods • Propagate endpoints • Add/remove services in load balancer • Network access to services • Data plane (load balancer) Pod Pod Pod Node KubeProxy Kubelet Load Balancer Pod Pod Pod
  • 4. API Server Control Plane ETCD services Controller Manager pods endpoints Endpoints Controller Node KubeProxy Node KubeProxy Node KubeProxy … … N nodes per cluster M pods per second QPS: N*M endpoints per second Service deployed Pod deployed and scheduled
  • 6. API Server Control Plane ETCD services Controller Manager pods endpoints Endpoints Controller Node KubeProxy Node KubeProxy Node KubeProxy … … N nodes per cluster M pods per second QPS: N*M endpoints per second Load: N*M*(M+1)/2 addresses per second
  • 7. Control Plane Solution Alternatives 1. Partition endpoints object into multiple objects • Pros: reduce Endpoints object size • Cons: increase # of objects and requests 2. Central load balancer • Pros: reduce connections and requests to API server • Cons: one more hop in service routing, require strong HA, limited LB scalability 3. Batch creating/updating endpoints • Timer based, no change to data structure in ETCD • Pros: reduce QPS • Cons: E2E latency is increased by Batch interval
  • 8. API Server Control Plane Solution ETCD services Controller Manager pods endpoints Endpoints Controller Node KubeProxy Node KubeProxy Node KubeProxy … … QPS: N*M per second Load: N*M*(M+1)/2 addresses per second Timer and batch QPS: N*M per second Load: N*M*(M+1)/2 addresses per second QPS: N per second Load: N*M addresses per second N nodes per cluster M pods per second
  • 9. Batch Processing Request Reduction One batch per 0.5 second. ➢ QPS:reduced 98% Pods per Service Number of Service EndPoints Controller # of Requests Before After Reduction 10 100 551 10 98.2% 150 785 14 98.2% 200 1105 17 98.5% Test setup: 1 Master,4 slaves 16 core 2.60GHz, 48GB RAM
  • 10. Batch Processing E2E Latency Reduction Latency: reduced 60+% Pods per Service Number of Service E2E Latency (Second) Before After Reduction 10 100 8.5 3.5 59.1% 150 13.5 5.3 60.9% 200 22.8 7.8 65.8%
  • 11. Data Plane • What is IPTables? • iptables is a user-space application that allows configuring Linux kernel firewall (implemented on top of Netfilter) by configuring chains and rules. • What is Netfilter? A framework provided by the Linux kernel that allows customization of networking-related operations, such as packet filtering, NAT, port translation etc. • Issues with IPTables as a load balancer • Latency to access service (routing latency) • Latency to add/remove rules
  • 12. IPTables Example # Iptables –t nat –L –n Chain PREROUTING (policy ACCEPT) target prot opt source destination KUBE-SERVICES all -- anywhere anywhere /* kubernetes service portals */ ← 1 DOCKER all -- anywhere anywhere ADDRTYPE match dst-type LOCAL Chain KUBE-SEP-G3MLSGWVLUPEIMXS (1 references) ← 4 target prot opt source destination MARK all -- 172.16.16.2 anywhere /* default/webpod-service: */ MARK set 0x4d415351 DNAT tcp -- anywhere anywhere /* default/webpod-service: */ tcp to:172.16.16.2:80 Chain KUBE-SEP-OUBP2X5UG3G4CYYB (1 references) target prot opt source destination MARK all -- 192.168.190.128 anywhere /* default/kubernetes: */ MARK set 0x4d415351 DNAT tcp -- anywhere anywhere /* default/kubernetes: */ tcp to:192.168.190.128:6443 Chain KUBE-SEP-PXEMGP3B44XONJEO (1 references) ← 4 target prot opt source destination MARK all -- 172.16.91.2 anywhere /* default/webpod-service: */ MARK set 0x4d415351 DNAT tcp -- anywhere anywhere /* default/webpod-service: */ tcp to:172.16.91.2:80 Chain KUBE-SERVICES (2 references) ← 2 target prot opt source destination KUBE-SVC-N4RX4VPNP4ATLCGG tcp -- anywhere 192.168.3.237 /* default/webpod-service: cluster IP */ tcp dpt:http KUBE-SVC-6N4SJQIF3IX3FORG tcp -- anywhere 192.168.3.1 /* default/kubernetes: cluster IP */ tcp dpt:https KUBE-NODEPORTS all -- anywhere anywhere /* kubernetes service nodeports; NOTE: this must be the last rule in this chain */ ADDRTYPE match dst-type LOCAL Chain KUBE-SVC-6N4SJQIF3IX3FORG (1 references) target prot opt source destination KUBE-SEP-OUBP2X5UG3G4CYYB all -- anywhere anywhere /* default/kubernetes: */ Chain KUBE-SVC-N4RX4VPNP4ATLCGG (1 references) ← 3 target prot opt source destination KUBE-SEP-G3MLSGWVLUPEIMXS all -- anywhere anywhere /* default/webpod-service: */ statistic mode random probability 0.50000000000 KUBE-SEP-PXEMGP3B44XONJEO all -- anywhere anywhere /* default/webpod-service: */
  • 13. IPTables Service Routing Performance 1 Service (µs) 1000 Services (µs) 10000 Services (µs) 50000 Services (µs) First Service 575 614 1023 1821 Middle Service 575 602 1048 4174 Last Service 575 631 1050 7077 In this test, there is one entry per service in KUBE-SERVICES chain. Where is latency generated? • Long list of rules in a chain • Enumerate through the list to find a service and pod
  • 14. Latency to Add IPTables Rules • Where is the latency generated? • not incremental • copy all rules • make changes • save all rules back • IPTables locked during rule update • Time spent to add one rule when there are 5k services (40k rules): 11 minutes • 20k services (160k rules): 5 hours
  • 15. Data Plane Solution • Re-structure IPTables using a search tree ○ Performance improvement • Replace IPTables with “IP Virtual Server” (IPVS) ○ Dramatic performance ○ Much smarter load balancing ■ weighted ■ retries ■ … etc
  • 16. VIP Restructure IPTables as a Search Tree 10.10.0.0/16 10.10.1.0/24 VIP: 10.10.1.5 VIP:10.10.1.100 10.10.100.0/24 VIP:10.10.100.1 Service VIP range: 10.10.0.0/16 CIDR list = [16, 24], defines tree layout Create 3 services: 10.10.1.5, 10.10.1.100, 10.10.100.1 Search tree based service routing time complexity: , m is tree depth Original service routing time complexity: O(n)
  • 17. What is IP Virtual Server (IPVS)? • Transport layer load balancer which directs requests for TCP and UDP based services to real servers. • Part of the Linux Virtual Server (LVS) project. • As with IPTables, IPVS is built on top of Netfilter. • Supports 3 load balancing modes: • NAT • DR (L2 load balancing via MAC rewriting) • IP Tunneling
  • 18. IPVS vs. IPTables IPTables: • Manipulates tables provided by the linux firewall • IPTables is more flexible and can manipulate packets at different stages: Pre- routing, post-routing, forward, input, output. • IPTables has more operations: SNAT, DNAT, reject packets, port translation etc. Why use IPVS? • Better performance (Hashing vs. Chains) • More load balancing algorithms • Round robin, source/destination hashing. • Based on least load, least connection or locality, can assign weights to servers. • Supports server health checks and connection retries • Supports sticky sessions
  • 19. IPVS Load Balancing Mode in Kubernetes • Not publically released yet • No Kubernetes behavior changes, fully compatible: • external IP, nodePort etc • Added kube-proxy startup parameter mode=IPVS, in addition to original modes: mode=userspace and mode=iptables • Original kube-proxy lines of code: 11800 • IPVS mode adds 680 lines of code • dependent on seasaw library
  • 20. IPVS vs. IPTables Latency to Add Rules # of Services 1 5,000 20,000 # of Rules 8 40,000 160,000 IPTables 2 ms 11 min 5 hours IPVS 2 ms 2 ms 2 ms Measured by iptables and ipvsadm, observations: ➢ In IPTables mode, latency to add rule increases significantly when # of service increases ➢ In IPVS mode, latency to add VIP and backend IPs does not increase when # of service increases
  • 21. IPVS vs. IPTables Network Bandwidth ➢Measured by qperf ➢Bandwidth, QPS, Latency have similar pattern ➢Env: 1 master, 4 slaves, 8 pods, all services use these 8 pods ➢Each service exposes 4 ports (4 entries in KUBE-SERVICES chain) ith service first first last first last first last first last first last # of services 1 1000 1000 5000 5000 10000 10000 25000 25000 50000 50000 Bandwidth, IPTables (MB/S) 66.6 64 56 50 38.6 15 6 0 0 0 0 Bandwidth, IPVS (MB/S) 65.3 61.7 55.3 53.5 53.8 43 43.5 30 28.5 24 23.8
  • 22. Other Perf/Scalability Work We Have Done... • Scale nodes and pods in single cluster • Reduce E2E latency of deploying pods/services • Increase pod deployment throughput • Improve scheduling performance • Optimistic concurrent scheduler • Flow graph based high performance scheduling and rescheduling with the University of Cambridge (UK).