Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Kubernetes Networking

17,282 views

Published on

Intro to Kubernetes Networking at the Seattle Kubernetes Meetup, December 2, 2015.

Published in: Software
  • @David Chua Actually it's a set of iptables chains and rules that redirect packets to Pod IPs. What these rules really do is mainly a DNAT (vIP->PodIP) rule.
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Thanks for the presentation. Was watching the original presentation on Youtube "Kubernetes Meetup on Networking". Very helpful in understanding the fundamentals of Kubernetes Networking. Got a quick question, on Slide 75, when the Client hits 10.0.123.45 which is a Virtual IP to the Service, where does it actually hit? As Service is an abstraction, which node does it hit to start the randomizer to hit the backend pods?
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Kubernetes Networking

  1. 1. Kubernetes Networking Seattle Kubernetes Meetup CJ Cullen <cjcullen@google.com> Software Engineer @cj_cullen github.com/cjcullen
  2. 2. Docker Networking
  3. 3. Docker networking docker start ...
  4. 4. Docker networking docker start ...
  5. 5. Docker networking docker0 172.16.1.0/24
  6. 6. Docker networking docker0 172.16.1.0/24 docker run ...
  7. 7. Docker networking docker0 172.16.1.0/24
  8. 8. Docker networking docker0 172.16.1.0/24 172.16.1.1 vethAQ2IT eth0
  9. 9. Docker networking docker0 172.16.1.0/24 172.16.1.1 vethAQ2IT eth0 docker run ...
  10. 10. Docker networking docker0 172.16.1.0/24 172.16.1.1 vethAQ2IT eth0 172.16.1.2 vethS1LUI eth0
  11. 11. 172.16.1.1 172.16.1.2 Docker networking 172.16.1.1 172.16.1.1
  12. 12. 172.16.1.1 172.16.1.2 Docker networking 172.16.1.1 172.16.1.1 NAT NAT NAT NAT NAT
  13. 13. Host ports A: 172.16.1.1 3306 B: 172.16.1.2 80 9376 11878SNAT SNAT C: 172.16.1.1 8000
  14. 14. Host ports A: 172.16.1.1 3306 B: 172.16.1.2 80 9376 11878SNAT SNAT C: 172.16.1.1 8000 REJECTED
  15. 15. Kubernetes Networking
  16. 16. Kubernetes networking IPs are routable • vs docker default private IP Pods can reach each other without NAT • even across nodes No brokering of port numbers • too complex, why bother? This is a fundamental requirement • can be L3 routed • can be underlayed (cloud) • can be overlayed (SDN)
  17. 17. 10.1.1.0/24 10.1.1.1 10.1.1.2 Kubernetes networking 10.1.2.0/24 10.1.2.1 10.1.3.0/24 10.1.3.1
  18. 18. 10.1.1.0/24 10.1.1.1 10.1.1.2 Kubernetes networking 10.1.2.0/24 10.1.2.1 10.1.3.0/24 10.1.3.1 ?
  19. 19. Kubernetes networking On GCE/GKE • GCE Advanced Routes (program the fabric) • “Everything to 10.1.1.0/24, send to this VM” Plenty of other ways • AWS: Route Tables • Weave • Calico • Flannel • OVS • OpenContrail • Cisco Contiv • Others...
  20. 20. Kubernetes networking On GCE/GKE • GCE Advanced Routes (program the fabric) • “Everything to 10.1.1.0/24, send to this VM” Plenty of other ways • AWS: Route Tables • Weave • Calico • Flannel • OVS • OpenContrail • Cisco Contiv • Others...
  21. 21. Kubernetes networking On GCE/GKE • GCE Advanced Routes (program the fabric) • “Everything to 10.1.1.0/24, send to this VM” Plenty of other ways • AWS: Route Tables • Weave • Calico • Flannel • OVS • OpenContrail • Cisco Contiv • Others...
  22. 22. Pods
  23. 23. Pods Small group of containers & volumes Tightly coupled The atom of scheduling & placement Shared namespace • share IP address & localhost • share IPC, etc. Managed lifecycle • bound to a node, restart in place • can die, cannot be reborn with same ID Example: data puller & web server Consumers Content Manager File Puller Web Server Volume Pod
  24. 24. Pods Small group of containers & volumes Tightly coupled The atom of scheduling & placement Shared namespace • share IP address & localhost • share IPC, etc. Managed lifecycle • bound to a node, restart in place • can die, cannot be reborn with same ID Example: data puller & web server 10.1.1.2
  25. 25. Pods Small group of containers & volumes Tightly coupled The atom of scheduling & placement Shared namespace • share IP address & localhost • share IPC, etc. Managed lifecycle • bound to a node, restart in place • can die, cannot be reborn with same ID Example: data puller & web server c1 --net=container:infra --ipc=container:infra infra 10.1.1.2 c2 --net=container:infra --ipc=container:infra
  26. 26. Services
  27. 27. Services A group of pods that work together • grouped by a selector Defines access policy • “load balanced” or “headless” Gets a stable virtual IP and port • sometimes called the service portal • also a DNS name VIP is managed by kube-proxy • watches all services • updates iptables when backends change Hides complexity - ideal for non-native apps Client Virtual IP
  28. 28. kube-proxy
  29. 29. kube-proxy (legacy) iptables kube-proxy apiserver Node X
  30. 30. iptables apiserver Node X watch services & endpoints kube-proxy (legacy) kube-proxy
  31. 31. iptables apiserver Node X kubectl run ... watch kube-proxy (legacy) kube-proxy
  32. 32. iptables apiserver Node X schedule watch kube-proxy (legacy) kube-proxy
  33. 33. iptables apiserver Node X watch kubectl expose ... kube-proxy (legacy) kube-proxy
  34. 34. iptables apiserver Node X new service! update kube-proxy (legacy) kube-proxy
  35. 35. iptables apiserver Node X watch kube-proxy (legacy) kube-proxy listen
  36. 36. iptables apiserver Node X watch kube-proxy (legacy) kube-proxy listen
  37. 37. iptables apiserver Node X watch configure kube-proxy (legacy) kube-proxy
  38. 38. iptables apiserver Node X watch VIP kube-proxy (legacy) kube-proxy
  39. 39. iptables apiserver Node X new endpoints! update VIP kube-proxy (legacy) kube-proxy
  40. 40. iptables apiserver Node X VIP watch kube-proxy (legacy) kube-proxy
  41. 41. iptables apiserver Node X VIP watch kube-proxy (legacy) kube-proxy Client
  42. 42. iptables apiserver Node X VIP watch kube-proxy (legacy) kube-proxy Client
  43. 43. iptables apiserver Node X VIP watch kube-proxy (legacy) kube-proxy Client
  44. 44. iptables apiserver Node X VIP watch kube-proxy (legacy) kube-proxy Client
  45. 45. kube-proxy (legacy) Userspace proxy isn’t ideal Burns CPU copying bytes • “Proxy” is just parallel copy loops. Loses source IP • Everything looks like it’s from the node IP. Userspace TCP listening = higher latency
  46. 46. iptables kube-proxy
  47. 47. iptables kube-proxy iptables kube-proxy apiserver Node X
  48. 48. iptables kube-proxy iptables kube-proxy apiserver Node X watch services & endpoints
  49. 49. iptables kube-proxy iptables kube-proxy apiserver Node X kubectl run ... watch
  50. 50. iptables kube-proxy iptables kube-proxy apiserver Node X schedule watch
  51. 51. iptables kube-proxy iptables kube-proxy apiserver Node X watch kubectl expose ...
  52. 52. iptables kube-proxy iptables kube-proxy apiserver Node X new service! update
  53. 53. iptables kube-proxy iptables kube-proxy apiserver Node X watch configure
  54. 54. iptables kube-proxy iptables kube-proxy apiserver Node X watch VIP
  55. 55. iptables kube-proxy iptables kube-proxy apiserver Node X new endpoints! update VIP
  56. 56. iptables kube-proxy iptables kube-proxy apiserver Node X VIP watch configure
  57. 57. iptables kube-proxy iptables kube-proxy apiserver Node X VIP watch
  58. 58. iptables kube-proxy iptables kube-proxy apiserver Node X VIP watch Client
  59. 59. iptables kube-proxy iptables kube-proxy apiserver Node X VIP watch Client
  60. 60. iptables kube-proxy iptables kube-proxy apiserver Node X VIP watch Client
  61. 61. iptables kube-proxy iptables kube-proxy apiserver Node X VIP watch Client
  62. 62. iptables kube-proxy Mean Latency contrib/for-tests/netperf-tester --number=1000 Mean Latency Microseconds iptables kube-proxy legacy kube-proxy
  63. 63. Services are just an abstraction • Only requirement: route (and maybe load balance) a virtual IP to a set of backends. Kube-proxy is an implementation • Kube-proxy watches apiserver. • iptables is re-configured on changes. There could be other ways • Userspace, iptables, IP Virtual Servers? Services
  64. 64. DNS Run SkyDNS as a pod in the cluster • kube2sky bridges Kubernetes API -> SkyDNS • Tell kubelets about it (static service IP) Strictly optional, but practically required • LOTS of things depend on it • Probably will become more integrated Or plug in your own! kubernetes kubernetes.default kubernetes.default.svc.cluster.local foo.my-namespace.svc.cluster.local
  65. 65. DNS Run SkyDNS as a pod in the cluster • kube2sky bridges Kubernetes API -> SkyDNS • Tell kubelets about it (static service IP) Strictly optional, but practically required • LOTS of things depend on it • Probably will become more integrated Or plug in your own! apiserverwatch etcd kube-dns-qxin kube2skyskyDNS
  66. 66. DNS Run SkyDNS as a pod in the cluster • kube2sky bridges Kubernetes API -> SkyDNS • Tell kubelets about it (static service IP) Strictly optional, but practically required • LOTS of things depend on it • Probably will become more integrated Or plug in your own! nameserver 10.0.0.10 ... /etc/resolv.conf apiserverwatch etcd kube-dns-qxin kube2skyskyDNS
  67. 67. DNS Run SkyDNS as a pod in the cluster • kube2sky bridges Kubernetes API -> SkyDNS • Tell kubelets about it (static service IP) Strictly optional, but practically required • LOTS of things depend on it • Probably will become more integrated Or plug in your own! nameserver 10.0.0.10 ... /etc/resolv.conf apiserverwatch etcd kube-dns-qxin kube2skyskyDNS 10.0.0.10
  68. 68. Putting it Together What happens when I... $ curl foo.my-namespace Client
  69. 69. What happens when I... $ curl foo.my-namespace Putting it Together nameserver 10.0.0.10 ... /etc/resolv.conf Client 10.1.0.1
  70. 70. 10.1.0.1 Putting it Together What happens when I... $ curl foo.my-namespace etcd kube-dns-qxin kube2skyskyDNS 10.0.0.10 foo.my-namespace? Client
  71. 71. Putting it Together What happens when I... $ curl foo.my-namespace etcd kube-dns-qxin kube2skyskyDNS 10.0.0.10 10.0.123.45 Client 10.1.0.1
  72. 72. Putting it Together What happens when I... $ curl foo.my-namespace 10.0.123.45Client 10.1.0.1
  73. 73. Putting it Together What happens when I... $ curl foo.my-namespace Client VIP10.0.123.45 10.1.0.1
  74. 74. Putting it Together What happens when I... $ curl foo.my-namespace Client VIP10.0.123.45 iptables 10.1.0.1
  75. 75. Putting it Together What happens when I... $ curl foo.my-namespace Client VIP10.0.123.45 iptables 10.1.0.1 10.1.0.6 10.1.3.1 10.1.6.3
  76. 76. Putting it Together What happens when I... $ curl foo.my-namespace Client VIP10.0.123.45 iptables 10.1.3.1 10.1.0.1 10.1.0.6 10.1.3.1 10.1.6.3
  77. 77. Putting it Together What happens when I... $ curl foo.my-namespace Client VIP10.0.123.45 iptables 10.1.3.1 10.1.0.1 10.1.0.6 10.1.3.1 10.1.6.3 10.1.3.0/24 -> Node X
  78. 78. Putting it Together What happens when I... $ curl foo.my-namespace Client VIP10.0.123.45 iptables 10.1.3.1 10.1.0.1 10.1.0.6 10.1.3.1 10.1.6.3
  79. 79. Putting it Together What happens when I... $ curl foo.my-namespace Client VIP10.0.123.45 iptables 10.1.3.1 10.1.0.1 10.1.0.6 10.1.3.1 10.1.6.3 Hello World!
  80. 80. Putting it Together What happens when I... $ curl foo.my-namespace Client iptables Hello World! 10.1.0.1 10.1.0.6 10.1.3.1 10.1.6.3 10.1.0.1
  81. 81. Putting it Together What happens when I... $ curl foo.my-namespace Client iptables Hello World! 10.1.0.1 10.1.0.6 10.1.3.1 10.1.6.3 10.1.0.0/24 -> Node Y 10.1.0.1
  82. 82. Putting it Together What happens when I... $ curl foo.my-namespace Client iptables Hello World! 10.1.0.1 10.1.0.6 10.1.3.1 10.1.6.3 10.1.0.0/24 -> Node Y 10.1.0.1
  83. 83. Putting it Together What happens when I... $ curl foo.my-namespace Hello World! Client iptables Hello World! 10.1.0.1 10.1.0.6 10.1.3.1 10.1.6.3 10.1.0.0/24 -> Node Y 10.1.0.1
  84. 84. What about external?
  85. 85. External Services Services IPs are only available inside the cluster Need to receive traffic from “the outside world” Builtin: Service “type” • nodePort: expose on a port on every node • loadBalancer: provision a cloud load-balancer DiY load-balancer solutions • socat (for nodePort remapping) • haproxy • nginx
  86. 86. The Bleeding Edge
  87. 87. Ingress (L7) Services are assumed L3/L4 Lots of apps want HTTP/HTTPS Ingress maps incoming traffic to backend services • by HTTP host headers • by HTTP URL paths HAProxy and GCE implementations No SSL yet Status: BETA in Kubernetes v1.1 URL Map Client
  88. 88. Ingress (L7) Services are assumed L3/L4 Lots of apps want HTTP/HTTPS Ingress maps incoming traffic to backend services • by HTTP host headers • by HTTP URL paths HAProxy and GCE implementations No SSL yet Status: BETA in Kubernetes v1.1 URL Map Client api.company.com api.company.com/foo api.company.com/bar othercompany.com/*
  89. 89. Network Plugins
  90. 90. Network Plugins Introduced in Kubernetes v1.0 • VERY experimental Uses CNI (CoreOS) in v1.1 • Simple exec interface • Not using Docker libnetwork • but can defer to Docker for networking Cluster admins can customize their installs • DHCP, MACVLAN, Flannel, custom net Plugin Plugin Plugin
  91. 91. Kubernetes is Open - open community - open design - open source - open to ideas Networking is Hard - help guide us! http://kubernetes.io https://github.com/kubernetes/kubernetes slack: kubernetes twitter: @kubernetesio

×