Successfully reported this slideshow.
Your SlideShare is downloading. ×

Deep dive in container service discovery

Ad

Laurent Bernaille, @lbernail
Staff Engineer, Datadog
Deep Dive in Container
Service Discovery

Ad

v
Subtitle here
Agenda
Time Title will go here when it’s ready Location
Service Discovery
Load-balancing
L7 Load-balancing

Ad

v
Service Discovery

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Loading in …3
×

Check these out next

1 of 43 Ad
1 of 43 Ad

Deep dive in container service discovery

Download to read offline

Service discovery and traffic load-balancing in the container ecosystem relies on different technologies, such as IPVS and iptables, and container orchestrators use different approaches. This talk will present in details how Docker Swarm and Kubernetes achieve this. The talk will continue with a demo showing how applications that are not managed by Kubernetes can take advantage of its native load-balancing. Finally, it will compare these approaches to service-mesh solutions.

Service discovery and traffic load-balancing in the container ecosystem relies on different technologies, such as IPVS and iptables, and container orchestrators use different approaches. This talk will present in details how Docker Swarm and Kubernetes achieve this. The talk will continue with a demo showing how applications that are not managed by Kubernetes can take advantage of its native load-balancing. Finally, it will compare these approaches to service-mesh solutions.

Advertisement
Advertisement

More Related Content

Slideshows for you (19)

Advertisement

More from Docker, Inc. (20)

Advertisement

Deep dive in container service discovery

  1. 1. Laurent Bernaille, @lbernail Staff Engineer, Datadog Deep Dive in Container Service Discovery
  2. 2. v Subtitle here Agenda Time Title will go here when it’s ready Location Service Discovery Load-balancing L7 Load-balancing
  3. 3. v Service Discovery
  4. 4. “Service discovery is the automatic detection of devices and services offered by these devices on a computer network” https://en.wikipedia.org/wiki/Service_discovery Why has this topic become so important? Service Discovery
  5. 5. Service discovery in Kubernetes apiVersion: apps/v1 kind: Deployment metadata: name: echodeploy labels: app: echo spec: replicas: 3 selector: matchLabels: app: echo template: metadata: labels: app: echo spec: containers: - name: echopod image: lbernail/echo:0.5 apiVersion: v1 kind: Service metadata: name: echo labels: app: echo spec: type: ClusterIP selector: app: echo ports: - name: http protocol: TCP port: 80 targetPort: 5000 Creating a deployment and a service
  6. 6. Created Kubernetes objects Deployment ReplicaSet Pod 1 label: app=echo Pod 2 label: app=echo Pod 3 label: app=echo Service Selector: app=echo kubectl get all NAME AGE deploy/echodeploy 16s NAME AGE rs/echodeploy-75dddcf5f6 16s NAME READY po/echodeploy-75dddcf5f6-jtjts 1/1 po/echodeploy-75dddcf5f6-r7nmk 1/1 po/echodeploy-75dddcf5f6-zvqhv 1/1 NAME TYPE CLUSTER-IP svc/echo ClusterIP 10.200.246.139
  7. 7. The endpoint object Deployment ReplicaSet Pod 1 label: app=echo Pod 2 label: app=echo Pod 3 label: app=echo kubectl describe endpoints echo Name: echo Namespace: datadog Labels: app=echo Annotations: <none> Subsets: Addresses: 10.150.4.10,10.150.6.16,10.150.7.10 NotReadyAddresses: <none> Ports: Name Port Protocol ---- ---- -------- http 5000 TCP Endpoints Addresses: 10.150.4.10 10.150.6.16 10.150.7.10 Service Selector: app=echo
  8. 8. Pod readiness readinessProbe: httpGet: path: /ready port: 5000 periodSeconds: 2 successThreshold: 2 failureThreshold: 2 ● A pod can be started but no ready to serve requests ○ Initialization ○ Connection to backends ● Kubernetes provides an abstraction for this: Readiness Probes
  9. 9. Demo kubectl run -it test --image appropriate/curl ash # while true ; do curl 10.200.246.139 ; sleep 1 ; done Container: 10.150.7.10 | Source: 10.150.6.17 | Version: v2 Container: 10.150.6.16 | Source: 10.150.6.17 | Version: v2 Container: 10.150.4.10 | Source: 10.150.6.17 | Version: v2 Container: 10.150.7.10 | Source: 10.150.6.17 | Version: v2 Container: 10.150.6.16 | Source: 10.150.6.17 | Version: v2 Container: 10.150.4.10 | Source: 10.150.6.17 | Version: v2 Container: 10.150.7.10 | Source: 10.150.6.17 | Version: v2 Container: 10.150.6.16 | Source: 10.150.6.17 | Version: v2 Container: 10.150.4.10 | Source: 10.150.6.17 | Version: v2
  10. 10. Demo kubectl exec -it <curl pod> sh # curl <podip>:5000/ready Ready : True # curl <podip>:5000/toggleReady # curl <podip>:5000/ready Ready : False kubectl get pods NAME READY echodeploy-75dddcf5f6-jtjts 1/1 echodeploy-75dddcf5f6-r7nmk 1/1 echodeploy-75dddcf5f6-zvqhv 0/1 kubectl describe endpoints echo Addresses: 10.150.4.10,10.150.6.16 kubectl describe pod echodeploy-75dddcf5f6-zvqhv Warning Unhealthy (Readiness probe failed)
  11. 11. How does this all work? API Server Node kubelet pod HC Status updates Node kubelet pod HC ETCD pods
  12. 12. How does this all work? API Server Node kubelet pod HC Status updates Controller Manager Watch - pods - services endpoint controller Node kubelet pod HC Sync endpoints: - list pods matching selector - add IP to endpoints ETCD pods services endpoints
  13. 13. v Load-Balancing
  14. 14. DNS Round Robin ● Service has a DNS record with one entry per endpoint ● Many clients will only use the first IP ● Many clients will perform resolution only at startup Virtual IP + IP based load-balancing ● Service has a single VIP ● Traffic sent to this VIP is load-balanced to endpoints IPs => Requires a “process” to perform and configure this load-balancing Load-balancing solutions
  15. 15. Load-balancing in Kubernetes API Server Node kube-proxy proxier Controller Manager Watch - pods - services endpoint controller Sync endpoints: - list pods matching selector - add IP to endpoints ETCD pods services endpoints Watch - services - endpoints
  16. 16. Load-balancing in Kubernetes API Server Node kube-proxy proxier Controller Manager endpoint controller ETCD pods services endpoints client Node Bpod 1 Node Cpod 2
  17. 17. ● userspace Original implementation Userland TCP/UDP proxy ● iptables Default since Kubernetes 1.2 Use iptables to load-balance traffic Faster than userspace ● ipvs Use Kernel load-balancing Still relies on iptables for some NAT rule Faster than iptables, scales better with large number of services/endpoints Kube-proxy modes
  18. 18. v IPTABLES Load-Balancing
  19. 19. API Server Node A kube-proxy iptables iptables overview client Node B Node C pod 1 pod 2 Outgoing traffic 1. Client to Service IP 2. DNAT: Client to Pod1 IP Reverse path 1. Pod1 IP to Client 2. Reverse NAT: Service IP to client
  20. 20. proxy-mode = iptables PREROUTING / OUTPUT any / any => KUBE-SERVICES All traffic is processed by kube chains
  21. 21. proxy-mode = iptables KUBE-SERVICES any / VIP:PORT => KUBE-SVC-XXX Global Service chain Identify service and jump to appropriate service chain PREROUTING / OUTPUT any / any => KUBE-SERVICES
  22. 22. proxy-mode = iptables KUBE-SERVICES any / VIP:PORT => KUBE-SVC-XXX KUBE-SVC-XXX any / any proba 33% => KUBE-SEP-AAA any / any proba 50% => KUBE-SEP-BBB any / any => KUBE-SEP-CCC PREROUTING / OUTPUT any / any => KUBE-SERVICES Service chain (one per service) Use statistic iptables module (probability of rule being applied) Rules are evaluated sequentially (hence the 33%, 50%, 100%)
  23. 23. proxy-mode = iptables KUBE-SERVICES any / VIP:PORT => KUBE-SVC-XXX KUBE-SVC-XXX any / any proba 33% => KUBE-SEP-AAA any / any proba 50% => KUBE-SEP-BBB any / any => KUBE-SEP-CCC PREROUTING / OUTPUT any / any => KUBE-SERVICES KUBE-SEP-AAA endpoint IP / any => KUBE-MARK-MASQ any / any => DNAT endpoint IP:Port Endpoint Chain Mark hairpin traffic (client = target) for SNAT DNAT to the endpoint
  24. 24. Edge case: Hairpin traffic API Server Node A kube-proxy iptables pod 1 Node B Node C pod 2 pod 3 Client can also be a destination After DNAT: Src IP= Pod1, Dst IP= Pod1 No reverse NAT possible => SNAT on host for this traffic 1. Pod1 IP => SVC IP 2. SNAT: HostIP => SVC IP 3. DNAT: HostIP => Pod1 IP Reverse path 1. Pod1 IP => Host IP 2. Reverse NAT: SVC IP => Pod1IP
  25. 25. Persistency spec: type: ClusterIP sessionAffinity: ClientIP sessionAffinityConfig: clientIP: timeoutSeconds: 600 KUBE-SEP-AAA endpoint IP / any => KUBE-MARK-MASQ any / any => DNAT endpoint IP:Port recent : set rsource KUBE-SEP-AAA Use “recent” module Add Source IP to set named KUBE-SEP-AAA
  26. 26. Persistency KUBE-SEP-AAA endpoint IP / any => KUBE-MARK-MASQ any / any => DNAT endpoint IP:Port recent : set rsource KUBE-SEP-AAA Use recent module Add Source IP to set named KUBE-SEP-AAA KUBE-SVC-XXX any / any recent: rcheck set KUBE-SEP-AAA => KUBE-SEP-AAA any / any recent: rcheck set KUBE-SEP-BBB => KUBE-SEP-BBB any / any recent: rcheck set KUBE-SEP-CCC => KUBE-SEP-CCC Load-balancing rules Use recent module If Source IP is in set named KUBE-SEP-AAA, jump to KUBE-SEP-AAA
  27. 27. Demos kubectl exec echodeploy-xxxx -it sh # hostname -i 10.1.161.2 # while true ; do wget -q -O - 10.200.20.164 ; sleep 1 ; done Container: 10.1.162.5 | Source: 10.1.161.2 | Version: Unknown Container: 10.1.161.2 | Source: 10.1.161.1 | Version: Unknown Container: 10.1.163.2 | Source: 10.1.161.2 | Version: Unknown Chains Hairpin traffic Persistency
  28. 28. iptables proxy gotchas Rules synchronization Every sync flushes and reload all Kubernetes chains Performance Design
  29. 29. v IPVS Load-Balancing
  30. 30. proxy-mode = ipvs ● L4 load-balancer build in the Linux Kernel ● Many load-balancing algorithms ● Very fast ● Still relies on iptables for some use cases (SNAT in particular)
  31. 31. IPVS Demo $ sudo ipvsadm --list --numeric --tcp-service 10.200.200.68:80 Prot LocalAddress:Port Scheduler Flags -> RemoteAddress:Port Forward Weight ActiveConn InActConn TCP 10.200.200.68:http rr -> 10.1.242.2:5000 Masq 1 0 0 -> 10.1.243.2:5000 Masq 1 0 0 Virtual Server Dummy interface sudo ip -d addr show kube-ipvs0 3: kube-ipvs0: <BROADCAST,NOARP> mtu 1500 qdisc noqueue state DOWN group default link/ether da:c8:87:73:ac:d4 brd ff:ff:ff:ff:ff:ff promiscuity 0 dummy numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535 inet 10.200.200.68/32 brd 10.200.200.68 scope global kube-ipvs0 valid_lft forever preferred_lft forever
  32. 32. IPVS Hairpin traffic $ sudo iptables -t nat -L KUBE-POSTROUTING Chain KUBE-POSTROUTING (1 references) target prot opt source destination MASQUERADE all -- anywhere anywhere mark match 0x4000/0x4000 MASQUERADE all -- anywhere anywhere match-set KUBE-LOOP-BACK dst,dst,src $ sudo ipset -L KUBE-LOOP-BACK Name: KUBE-LOOP-BACK Type: hash:ip,port,ip Members: 10.1.243.2,tcp:5000,10.1.243.2 10.1.242.2,tcp:5000,10.1.242.2 Same as iptables but uses IPSET When src & dst == endpoint IP => SNAT ip sets are much faster than iptables simple rules with long lists
  33. 33. Persistency $ sudo ipvsadm --list --numeric --tcp-service 10.200.200.68:80 Prot LocalAddress:Port Scheduler Flags -> RemoteAddress:Port Forward Weight ActiveConn InActConn TCP 10.200.200.68:80 rr persistent 600 -> 10.1.242.2:5000 Masq 1 0 0 -> 10.1.243.2:5000 Masq 1 0 0 Native option of virtual services
  34. 34. Not considered stable yet Much better performances ● No chain traversal: faster DNAT ● No full reload to add an endpoint / service: much faster updates ● See “Scale Kubernetes to support 50000 services”, Haibin Michael Xie (Linuxcon China) Definitely the future of kube-proxy IPVS status
  35. 35. Alternatives to kube-proxy Kube-router ● https://github.com/cloudnativelabs/kube-router ● Pod Networking with BGP ● Network Policies ● IPVS based service-proxy Cilium ● Relies on eBPF to implement service proxying ● Implement security policies with eBPF ● Really promising Other ● Very dynamic area, expect to see other solutions
  36. 36. API Server Node A kube-proxy iptables What about DNS DNS client Node B Node C DNS pod 1 DNS pod 2 Just another Kube Service DNS pods get DNS info from API server
  37. 37. Access services from outside kube Run kube-proxy on an external VM Requires routable pod IPs DNS
  38. 38. Access services from outside kube VM API Server kube-proxy iptables Node Service pod Node Service pod Service pod Node client
  39. 39. Access services from outside kube VM API Server kube-proxy iptables Node Service pod DNS pod Node Service pod Service pod Node DNS poddnsmasqclient
  40. 40. v L7 Load-balancing
  41. 41. L7 load balancing options Ingress controllers Service mesh (Istio)
  42. 42. Key takeaways Complicated under the hood ● Helps to know where to look at when debugging complex setups Service discovery ● Challenge: integrate with hosts outside of Kubernetes Load-Balancing ● L4 is still very dynamic (IPVS, eBPF) ● L7 is only starting, expect to see a lot
  43. 43. Thank you We’re hiring! Questions/ comments: @lbernail https://github.com/lbernail/dockercon2018

×