Successfully reported this slideshow.
Your SlideShare is downloading. ×

How to build a Kubernetes networking solution from scratch

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Loading in …3
×

Check these out next

1 of 38 Ad

How to build a Kubernetes networking solution from scratch

Download to read offline

Presented by: Antonin Bas & Jianjun Shen, VMware
Presented at All Things Open 2020

Abstract: For the non-initiated, Kubernetes (K8s) networking can be a bit like dark magic. Many clusters have requirements beyond what the default network plugin, kubenet, can provide and require the use of a third-party Container Network Interface (CNI) plugin. But what exactly is the role of these plugins, how do they differ from each other and how does the choice of one affect your cluster?

In this talk, Antonin and Jianjun will describe how a group of developers was able to build a CNI plugin - an open source project called Antrea - from scratch and bring it to production in a matter of months. This velocity was achieved by leveraging existing open-source technologies extensively: Open vSwitch, a well-established programmable virtual switch for the data plane, and the K8s libraries for the control plane. Antonin and Jianjun will explain the responsibilities of a CNI plugin in the context of K8s and will walk the audience through the steps required to create one. They will show how Antrea integrates with the rest of the cloud-native ecosystem (e.g. dashboards such as Octant and Prometheus) to provide insight into the network and ensure that K8s networking is not just dark magic anymore.

Presented by: Antonin Bas & Jianjun Shen, VMware
Presented at All Things Open 2020

Abstract: For the non-initiated, Kubernetes (K8s) networking can be a bit like dark magic. Many clusters have requirements beyond what the default network plugin, kubenet, can provide and require the use of a third-party Container Network Interface (CNI) plugin. But what exactly is the role of these plugins, how do they differ from each other and how does the choice of one affect your cluster?

In this talk, Antonin and Jianjun will describe how a group of developers was able to build a CNI plugin - an open source project called Antrea - from scratch and bring it to production in a matter of months. This velocity was achieved by leveraging existing open-source technologies extensively: Open vSwitch, a well-established programmable virtual switch for the data plane, and the K8s libraries for the control plane. Antonin and Jianjun will explain the responsibilities of a CNI plugin in the context of K8s and will walk the audience through the steps required to create one. They will show how Antrea integrates with the rest of the cloud-native ecosystem (e.g. dashboards such as Octant and Prometheus) to provide insight into the network and ensure that K8s networking is not just dark magic anymore.

Advertisement
Advertisement

More Related Content

Slideshows for you (20)

Similar to How to build a Kubernetes networking solution from scratch (20)

Advertisement

More from All Things Open (20)

Recently uploaded (20)

Advertisement

How to build a Kubernetes networking solution from scratch

  1. 1. How to Build a Kubernetes Networking Solution from Scratch Antonin Bas, Jianjun Shen Project Antrea maintainers @VMware ATO, October 2020
  2. 2. Agenda 2 Container and K8s networking Building a K8s network plugin with Open vSwitch Introducing Project Antrea More visibility into K8s networks with Project Antrea Q&A
  3. 3. 3 Basics of Container Networking Network Namespace • Isolated network environment provided by Linux kernel Interconnect • A simple way: veth devices & Linux bridge Communication across hosts • Network address translation and port mapping Docker bridge network on Linux docker0 (Linux bridge) 10.10.0.1/24 container1 – netns ns1 eth0 lo container2 – netns ns2 eth0 lo ens0 veth1 veth2 10.10.0.11/24 10.10.0.12/24 root netns Docker host SNAT 172.1.1.11/16
  4. 4. 4 Kubernetes is an open-source platform for automating deployment, scaling, and operations of application containers across clusters of hosts, providing container-centric infrastructure. What is Kubernetes?
  5. 5. 5 Kubernetes Components K8s Cluster consists of Master(s) and Nodes K8s Master Components • API Server • Scheduler • Controller Manager • etcd K8s Node Components • kubelet • kube-proxy • Container Runtime K8s master K8s master K8s Master Controller Manager K8s API Server Key-Value Store dashboard Scheduler K8s node K8s node K8s node K8s node K8s Nodes kubelet c runtime kube-proxy > _ Kubectl CLI K8s Master(s)
  6. 6. 6 Kubernetes Pod "Pods are the smallest deployable units of computing that you can create and manage in Kubernetes" A Pod comprises a group of one or more containers that shares an IP address and a network namespace. Pod pause container (‘owns’ the IP stack) 10.24.0.0/16 10.24.0.2 nginx tcp/80 mgmt tcp/22 logging udp/514 IPC External IP Traffic
  7. 7. 7 Kubernetes Namespace “Namespaces are a way to divide cluster resources between multiple users” “Namespaces provide a scope for names” Namespace level access control is supported. Namespace: foo Base URI: /api/v1/namespaces/foo 'redis-master' Pod: /api/v1/namespaces/foo/pods/redis-master 'redis' Service: /api/v1/namespaces/foo/services/redis Namespace: bar Base URI: /api/v1/namespaces/bar 'redis-master' Pod: /api/v1/namespaces/bar/pods/redis-master 'redis' Service: /api/v1/namespaces/bar/services/redis
  8. 8. 8 Kubernetes Service "An abstract way to expose an application running on a set of Pods as a network service" Serves multiple functions: • Service Discovery / DNS • East/West load balancing in the Cluster (Type: ClusterIP) • External load balancing for L4 TCP/UDP (Type: LoadBalancer) • External access to the Service through the Nodes IPs (Type: NodePort) Redis Pods Redis Service 10.24.0.5 ClusterIP 172.30.0.24 Web Front-End Pods 10.24.2.7 ▶ kubectl describe svc redis Name: redis Namespace: default Selector: app=redis Type: LoadBalancer IP: 172.30.0.24 LoadBalancer Ingress: 134.247.200.20 Port: <unnamed> 6379/TCP Endpoints: 10.24.0.5:6379, 10.24.2.7:6379 DNS: redis.<ns>.cluster.local è 172.30.0.24 ExternalIP 134.247.200.20 DNS: redis.external.com è 134.247.200.20
  9. 9. 9 Kubernetes NetworkPolicy “A specification of how groups of Pods are allowed to communicate with each other and other network endpoints“ Selects Pods to apply the NetworkPolicy with matching labels Redis Pods Redis Service 10.24.0.5 ClusterIP 172.30.0.24 Web Front-End Pods 10.24.2.7 ▶ kubectl describe netpol web-front-redis Name: web-front-redis Namespace: default Spec: PodSelector: app=redis Allowing ingress traffic: To Port: 6379/TCP From: PodSelector: app=web-front-end Policy Types: Ingress
  10. 10. 10 Kubernetes Cluster Networking Three communication patterns must be enabled Pod -to- Pod Pod -to- Service External -to- Service POD POD POD P P P P P P
  11. 11. 12 What is a Kubernetes CNI Network Plugin responsible for? Pod Network Connectivity Plumbing eth0 (network interface) into Pod network IP Address Management (IPAM) E-W Service Load Balancing (optional) Make traffic available to upstream kube-proxy, or Implement native service load balancing – VIP DNAT NetworkPolicy Enforcement (optional) Enforcing Kubernetes Network Policy Traffic Shaping Support (experimental)
  12. 12. 13 kubenet Relies on cloud network to route traffic between Nodes • Typically works with a Cloud Provider implementation that adds routes to the cloud router. • Supported on AWS, Azure, GCP. No NetworkPolicy support Out-of-box Kubernetes network plugin cbr0 (Linux bridge) 10.10.1.1/24 Pod1A eth0 Pod1B eth0 ens0 veth1 veth2 10.10.1.11/24 10.10.1.12/24 Node 1 cbr0 (Linux bridge) 10.10.2.1/24 Pod2A eth0 Pod2B eth0 ens0 veth1 veth2 10.10.2.11/24 10.10.2.12/24 Node 2 Cloud Network Fabric 172.1.1.11 172.1.2.22 Destination Target 10.10.1.0/24 172.1.1.11 10.10.2.0/24 172.1.2.22
  13. 13. 14 kube-proxy Implements distributed load- balancing for Services of ClusterIP and NodePort types Supports: IPTables, IPVS, and user space proxy modes E-W Service Load-Balancing Picture from: https://kubernetes.io/docs/concepts/services-networking/service
  14. 14. 15 Container Network Interface (CNI) Where does the CNI fit in the Pod’s lifecycle? K8s control plane kubelet Container Runtime (e.g. containerd) Network Plugin Pod K8s Node Pod Network 1. User creates Pod spec 2. Pod is scheduled on Node 3.CRI call 5.CNI call 4. Run Pod 6. Add to Pod network
  15. 15. 18 And why use it for K8s networking? What is Open vSwitch (OVS)? A high-performance programmable virtual switch • Connects to VMs (tap) and containers (veth) Linux foundation project, very active Portable: Works out of the box on all Linux distributions and supports Windows Programmability: Supports many protocols, build your own forwarding pipeline High-performance • DPDK, AF_XDP • Hardware offload available across multiple vendors Rich feature set: • Multi-layers – L2 to L4 • Advanced CLI tools • Statistics, QoS • Packet tracing
  16. 16. 19 Configuring Pod networking with OVS step-by-step CNI_COMMAND=ADD CNI_CONTAINERID=79ba130ac32e1c621e0e10ea10e3e8b7c0b101932f309ead54ee93fdf1795768 CNI_NETNS=/proc/1125/ns/net CNI_IFNAME=eth0 CNI_ARGS="K8S_POD_NAMESPACE=default;K8S_POD_NAME=nginx-66b6c48dd5- skx7z;K8S_POD_INFRA_CONTAINER_ID=79ba130ac32e1c621e0e10ea10e3e8b7c0b101932f309ead54ee93fdf1795768" CNI_PATH=/opt/cni/path # from stdin { "cniVersion": "0.3.0", "name": "antrea", "type": "antrea", “dns":{}, "ipam":{ "type": "host-local", "subnet": "10.10.1.0/24", "gateway": "10.10.1.1” } } From environment variables From stdin
  17. 17. 20 Connecting the Pod to the OVS bridge OVS bridge (br-int) Container nginx lo ens0root netnsK8s Node K8s Pod nginx-66b6c48dd5-skx7z /proc/1125/ns/net netns ovs-vsctl add-br br-int
  18. 18. 21 Connecting the Pod to the OVS bridge OVS bridge (br-int) Container nginx lo ens0 eth0 veth1 root netnsK8s Node K8s Pod nginx-66b6c48dd5-skx7z /proc/1125/ns/net netns nsenter -t 1125 -n bash Ø ip link add eth0 type veth peer name veth1
  19. 19. 22 Connecting the Pod to the OVS bridge OVS bridge (br-int) Container nginx lo ens0 eth0 veth1 root netnsK8s Node K8s Pod nginx-66b6c48dd5-skx7z /proc/1125/ns/net netns nsenter -t 1125 -n bash Ø ip link add eth0 type veth peer name veth1 Ø ip link set veth1 netns 1
  20. 20. 23 Connecting the Pod to the OVS bridge OVS bridge (br-int) Container nginx lo ens0 eth0 veth1 root netnsK8s Node K8s Pod nginx-66b6c48dd5-skx7z /proc/1125/ns/net netns nsenter -t 1125 -n bash Ø ip link add eth0 type veth peer name veth1 Ø ip link set veth1 netns 1 Ø ip link set eth0 mtu <MTU> Ø ip addr add 10.10.1.2/24 dev eth0 Ø ip route add default via 10.10.1.1 dev eth0 Ø ip link set dev eth0 up Ø exit 10.10.1.2/24
  21. 21. 24 Connecting the Pod to the OVS bridge OVS bridge (br-int) Container nginx lo ens0 eth0 veth1 root netnsK8s Node K8s Pod nginx-66b6c48dd5-skx7z /proc/1125/ns/net netns nsenter -t 1125 -n bash Ø ip link add eth0 type veth peer name veth1 Ø ip link set veth1 netns 1 Ø ip link set eth0 mtu <MTU> Ø ip addr add 10.10.1.2/24 dev eth0 Ø ip route add default via 10.10.1.1 dev eth0 Ø ip link set dev eth0 up Ø exit ovs-vsctl add-port br-int veth1 ovs-vsctl show Bridge br-int … Port veth1 Interface veth1 … ovs_version: "2.14.0" 10.10.1.2/24
  22. 22. 25 Intra-Node Pod-to-Pod traffic By default OVS behaves like a regular L2 Linux bridge A network plugin using OVS can provide additional security by preventing IP / ARP spoofing OVS bridge (br-int) PodA eth0 PodB eth0 ens0 veth1 veth2 10.10.1.2/24 10.10.1.3/24 root netns K8s Node ovs-ofctl add-flow br-int table=0,priority=200,arp,in_port=nginx,arp_spa=10.10.1.2,a rp_sha=<MAC>,actions=goto_table=10 ovs-ofctl add-flow br-int table=0,priority=200,ip,in_port=nginx,nw_src=10.10.1.2,dl_ src=<MAC>,actions=goto_table=10 ovs-ofctl add-flow br-int table=0,priority=0,actions=drop ovs-ofctl add-flow br-int table=10,priority=0,actions=NORMAL
  23. 23. 26 Inter-Node Pod-to-Pod traffic The default gateway for Pod1A is 10.10.1.1, which is assigned to the OVS bridge (internal port) All traffic that’s not destined to a local Pod will be forwarded to gw0. Then what? è Build an overlay network OVS bridge (br-int) Pod1A eth0 Pod1B eth0 ens0 veth1 veth2 10.10.1.11/24 10.10.1.12/24 Node 1 OVS bridge (br-int) Pod2A eth0 Pod2B eth0 ens0 veth1 veth2 10.10.2.11/24 10.10.2.12/24 Node 2 Cloud / Physical Network Fabric 172.1.1.11 172.1.2.22 ? gw0 10.10.1.1/24 gw0 10.10.2.1/24 Destination Target 10.10.1.0/24 - * 10.10.1.1
  24. 24. 27 Inter-Node Pod-to-Pod traffic Supported protocols: • Geneve / VXLAN / GRE / STT Building an overlay network with OVS OVS bridge (br-int) Pod1A eth0 Pod1B eth0 ens0 veth1 veth2 10.10.1.11/24 10.10.1.12/24 Node 1 OVS bridge (br-int) Pod2A eth0 Pod2B eth0 ens0 veth1 veth2 10.10.2.11/24 10.10.2.12/24 Node 2 Cloud / Physical Network Fabric 172.1.1.11 172.1.2.22 # on Node 1 ovs-vsctl add-port br-int tun0 -- set interface tun0 type=geneve options:remote_ip=flow options:key=flow ovs-vsctl show Bridge br-int … Port tun0 Interface tun0 type: geneve options: {key=flow, remote_ip=flow} Port gw0 Interface gw0 type: internal ovs_version: "2.14.0” gw0 10.10.1.1/24 gw0 10.10.2.1/24 tun0 tun0
  25. 25. 28 Inter-Node Pod-to-Pod traffic Each Node has its own Pod subnet Broadcast domain is limited to a single Node New flows for inter-Node traffic Each Node’s Pod subnet is read from K8s API Building an overlay network with OVS OVS bridge (br-int) Pod1A eth0 Pod1B eth0 ens0 veth1 veth2 10.10.1.11/24 10.10.1.12/24 Node 1 OVS bridge (br-int) Pod2A eth0 Pod2B eth0 ens0 veth1 veth2 10.10.2.11/24 10.10.2.12/24 Node 2 Cloud / Physical Network Fabric 172.1.1.11 172.1.2.22 gw0 10.10.1.1/24 gw0 10.10.2.1/24 tun0 tun0 # on Node 1 ovs-ofctl add-flow br-int table=10,priority=200,ip, nw_dst=10.10.2.0/24,actions=dec_ttl,load:172.1.1.11- >NXM_NX_TUN_IPV4_DST[],output:tun0 ovs-ofctl add-flow br-int table=10,priority=200,ip, in_port=tun0,nw_dst=10.10.1.11,actions=mod_dl_dst:<MAC_PO D1A>,mod_dl_src:<MAC_GW0>,output:veth1 ovs-ofctl add-flow br-int table=10,priority=200,ip, in_port=tun0,nw_dst=10.10.1.12,actions=mod_dl_dst:<MAC_PO D1B>,mod_dl_src:<MAC_GW0>,output:veth1
  26. 26. 30 K8s Networking with Open vSwitch L2 switching for local Pod-to- Pod traffic Overlay network for Inter- Node traffic SNAT for Pod-to-external traffic OVS programmability supports implementing the entire K8s network model Recap Node 1 (VM) Node 2 (VM) Pod A Pod B OvS bridge eth0 eth0 NIC Cloud Network Fabric vethA gw0 tun0 vethB Pod C Pod D OvS bridge eth0 eth0 NIC vethA gw0 tun0 vethB SNAT pod-to-external pod-to-pod (inter-node) pod-to-pod (intra-node)
  27. 27. 31 Kubernetes CNI Plugins Dataplane technologies Open vSwitch BIRD (BGP), IPTables, eBPF (since v3.16.0) eBPF Linux bridge Network modes Overlay (Geneve, VXLAN, GRE, STT) or no-encapsulation Overlay (IPIP, VXLAN) or BGP routing Overlay (Geneve, VXLAN) or no-encapsulation Overlay (VXLAN) or no-encapsulation NetworkPolicy Open vSwitch Centralized policy computation IPTables or eBPF eBPF N/A Windows Support Open vSwitch Windows BGP, Virtual Filtering Platform N/A win-bridge or win-overlay 26 “third party” plugins listed at: https://github.com/containernetworking/cni, besides the “core plugins” maintained by the CNI project. CNI plugins for specific cloud / IaaS platform:
  28. 28. 32 Project Antrea is an open source CNI network plugin for Kubernetes based on Open vSwitch, providing: • Pod network connectivity • NetworkPolicy enforcement • Service load balancing = ++ https://antrea.io @ProjectAntrea https://github.com/vmware-tanzu/antrea Kubernetes Slack – #antrea
  29. 29. 33 Antrea is a community driven project focusing on • simplifying usability & diagnostics, • adapting any cloud and network topology, • providing comprehensive security policies, and • improving scaling & performance for container networking in Kubernetes. https://antrea.io @ProjectAntrea https://github.com/vmware-tanzu/antrea Kubernetes Slack – #antrea 782 GitHub Stars 136 GitHub Forks 42 ContributorsPrivate Cloud Public Cloud Edge Linux Windows runs on
  30. 30. 34 Open vSwitch provides a flexible and performant data plane. Project Antrea Architecture Worker Node Worker Node Master Node kubelet antrea agent kube- proxy kubectlpod A pod B kube- api control-plane data-plane CRDsNetwork Policy Gateway Gateway Tunnel CNI CNI antrea agent IPtables kube- proxy IPtables veth pair veth pair Antrea Agent • Manages Pod network interfaces and OVS bridge. • Implements overlay network, NetworkPolicies, and Service load balancing with OVS. Antrea Controller • Computes NetworkPolicies and publishes the results to Antrea Agents. • High performance channel to Agents based on the K8s apiserver lib. Built with K8s technologies • Leverages K8s and K8s solutions for API, control plane, deployment, UI and CLI. • Antrea Controller and Agent are based on K8s controller and apiserver libs. kubectl apply -f https://github.com/vmware- tanzu/antrea/releases/download/v0.10.1/antrea.yml antrea controller
  31. 31. 35 Demo Video 1 Antrea Setup and OVS Networking https://youtu.be/KGjGimuLXSI
  32. 32. 36 NetworkPolicy Implementation Node 1 OVS bridge Antrea Agent Pod1A app=client 10.10.1.2 Openflow Node 2 OVS bridge Antrea Agent Pod2A app=server 10.10.2.2 Openflow Pod2A app=server 10.10.2.3 Node 3 OVS bridge Antrea Agent Pod3A app=other 10.10.3.2 Openflow Pod3B app=server 10.10.3.3 Antrea Controller K8s apiserver lib K8s API NetworkPolicy = “Pods with label ‘app=server’ can only receive traffic from Pods with label ‘app=client’ and only on port ‘TCP 80’. AppliedToGroup = Name: “foo” Pods: {Pod3B} AddressGroup = Name: “bar” Pods: {10.10.1.2} NetworkPolicy = Rule: Direction: Ingress From: {“bar”} Ports: {TCP/80} AppliedTo: {“foo”} Span = {Node2, Node3} … … NetworkPolicies Pods (Namespace, labels, IP addr, Node) Namespaces (labels) Table 90 (IngressRule table): priority=200,ip,nw_src=10.10.1.2 actions=conjunction(1,1/3) priority=200,ip,reg1=0x4 actions=conjunction(1,2/3) priority=200,tcp,tp_dst=80 actions=conjunction(1,3/3) priority=190,conj_id=1,ip actions=goto_table:105 Table 100 (IngressRuleDefault table): priority=200,ip,reg1=0x4 actions=drop # reg1 saves the input OVS port’s ofport number. # Pod3B’s ofport = 4. Centralized controller for NetworkPolicy computation • Each Node’s Agent receives only the relevant data • Compute once. Agent just performs simple conversion to OVS flows. • Controller = single source of truth. High performance communication channel built with K8s apiserver library. Use OVS flow conjunction • Reduce number of flows
  33. 33. 37 Demo Video 2 Network Policies and Traceflow https://youtu.be/Sv_7qI7A2YY
  34. 34. 38 The Open vSwitch pipeline
  35. 35. 39 Antrea in the cloud-native ecosystem Providing visibility into the network Prometheus metrics exported from Agents & Controller Octant plugin to monitor components and trace packets ELK stack to visualize flow maps for the cluster network
  36. 36. 40 Demo Video 3 K8s Network Visibility with Antrea https://youtu.be/qzTeUaePJRo
  37. 37. 43 Network Plugins implement the CNI and provide L2/L3 connectivity in K8s clusters Open vSwitch can implement the full K8s network model with a unified data plane Project Antrea: a production-grade Network Plugin built in < 1 year OVS as the data plane K8s libraries for a highly-scalable control plane Integrations with cloud-native ecosystem tools to provide visibility into the network Suggest new integrations to us on Github! Conclusion
  38. 38. 44 Come help us continually improve Kubernetes Networking! Kubernetes Slack #antrea Community Meeting, Mondays @ 9PM PT Zoom Link https://github.com/vmware-tanzu/antrea • Good first issues • Help us improve our documentation • Propose new features • File Bugs projectantrea-announce projectantrea projectantrea-dev (Google Groups) @ProjectAntrea @ https://antrea.io • Documentation • Blogs

×