Successfully reported this slideshow.
Your SlideShare is downloading. ×

Learning how AWS implement AWS VPC CNI

More Related Content

Related Books

Free with a 30 day trial from Scribd

See all

Learning how AWS implement AWS VPC CNI

  1. 1. AWS VPC CNI AWS User Taiwan Group HungWei Chiu
  2. 2. Bio • HungWei Chiu(Hwchiu) • MTS @ Open Networking Foundation (ONF) • Kubernetes/Container/Linux/Network...etc • Blog: https://hwchiu.com • Facebook: 矽⾕⽜的耕⽥筆記
  3. 3. Agenda • Network Connectivity • What • How • AWS VPC CNI • What • Why • How
  4. 4. Network Connectivity • How service access outside world • How service is accessed by other services • IPAM (IP address management) • Environments • Bare metal • Virtualization • VM/Contaner • Orchestrator • OpenStack/K8s
  5. 5. Bare Metal Home https://awei791129.pixnet.net/blog/post/20872246 PPPOE Laptop Server Mobile Phone Wire Wireless Wireless 66.88.99.45 192.168.0.1 192.168.0.12 192.168.0.3 192.168.0.5 Device Router
  6. 6. NAT • Source NAT (SNAT) and Destination NAT (DNAT) • SNAT • Change Source IP • Internal to external • DNAT • Change Destination IP • External to internal
  7. 7. SNAT Home https://awei791129.pixnet.net/blog/post/20872246 PPPOE Laptop Server Mobile Phone Wire Wireless Wireless 66.88.99.45 192.168.0.1 192.168.0.12 192.168.0.3 192.168.0.5 Device Router P 8.8.8.8 192.168.0.12 Packet P 8.8.8.8 66.88.99.45 Packet
  8. 8. DNAT(Port Mapping) Home https://awei791129.pixnet.net/blog/post/20872246 PPPOE Laptop Server Mobile Phone Wire Wireless Wireless 66.88.99.45 192.168.0.1 192.168.0.12 192.168.0.3 192.168.0.5 Device Router P 8.8.8.8 192.168.0.12 Packet P 8.8.8.8 66.88.99.45 Packet
  9. 9. Bare Metal Home https://awei791129.pixnet.net/blog/post/20872246 PPPOE Laptop Server Mobile Phone Wire Wireless Wireless 66.88.99.45 192.168.0.1 10.15.0.12 10.15.0.5 10.15.0.6 Device Router Wireless AP Wire 192.168.0.2 10.15.0.2 Router SNAT/DNAT SNAT/DNAT Server 192.168.0.3
  10. 10. Bare Metal Home https://awei791129.pixnet.net/blog/post/20872246 PPPOE Laptop Server Mobile Phone Wire Wireless Wireless 66.88.99.45 192.168.0.1 10.15.0.12 10.15.0.5 10.15.0.6 Device Router Wireless AP Wire 192.168.0.2 10.15.0.2 Router SNAT/DNAT SNAT/DNAT Server 192.168.0.3
  11. 11. NAT • Increase the complexity for debugging • Decorate the network performance • Increase the security • Decrease the accessibility
  12. 12. Container Home https://awei791129.pixnet.net/blog/post/20872246 PPPOE Server Wire 66.88.99.45 192.168.0.1 192.168.0.12 Container Router Container 1 Router Container 2 Container 3 Linux Bridge eth0 10.18.0.1 10.18.0.2 10.18.0.12 10.18.0.4
  13. 13. NAT Home https://awei791129.pixnet.net/blog/post/20872246 PPPOE Server Wire 66.88.99.45 192.168.0.1 192.168.0.12 Container Router Container 1 Router Container 2 Container 3 Linux Bridge eth0 10.18.0.1 10.18.0.2 10.18.0.12 10.18.0.4 8.8.8.8 10.18.0.2 8.8.8.8 66.88.99.45 8.8.8.8 192.168.0.12
  14. 14. Docker Expose (-p 8080:80) Home https://awei791129.pixnet.net/blog/post/20872246 PPPOE Server Wire 66.88.99.45 192.168.0.1 192.168.0.12 Container Router Container 1 Router Container 2 Container 3 Linux Bridge eth0 10.18.0.1 10.18.0.2 10.18.0.12 10.18.0.4 Laptop Browser 192.168.0.15 10.18.0.4 192.168.0.15 192.168.0.12 192.168.0.15 10.18.0.4 192.168.0.15
  15. 15. Container • Private subnet by default • NAT is required • Docker simplify the DNAY process (-p) • More and more NAT
  16. 16. Advanced Mode Home https://awei791129.pixnet.net/blog/post/20872246 PPPOE Server Wire 66.88.99.45 192.168.0.1 192.168.0.12 Container Router Container 1 Switch Container 2 Container 3 eth0 192.168.0.32 192.168.0.28 192.168.0.25 Laptop Browser 192.168.0.15 192.168.0.28 192.168.0.15 192.168.0.28 192.168.0.15 192.168.0.28 192.168.0.15
  17. 17. Advance Mode • NAT isn't necessary • Better performance • Issue • How to manage container's IP addresses? • Con f lict? • Multiple nodes?
  18. 18. Container Clusters Home https://awei791129.pixnet.net/blog/post/20872246 PPPOE Server 66.88.99.45 192.168.0.1 192.168.0.12 Container Router Container 1 Router Container 2 Container 3 Linux Bridge eth0 10.18.0.1 10.18.0.2 10.18.0.12 10.18.0.4 Server eth0 Linux Bridge Container 1 Container 2 192.168.0.15
  19. 19. Same subnet Home https://awei791129.pixnet.net/blog/post/20872246 PPPOE Server 66.88.99.45 192.168.0.1 192.168.0.12 Container Router Container 1 Router Container 2 Container 3 Linux Bridge eth0 10.18.0.1 10.18.0.2 10.18.0.12 10.18.0.4 Server eth0 Linux Bridge Container 1 Container 2 192.168.0.15 10.18.0.1 10.18.0.15 10.18.0.12
  20. 20. Different Subnet Home https://awei791129.pixnet.net/blog/post/20872246 PPPOE Server 66.88.99.45 192.168.0.1 192.168.0.12 Container Router Container 1 Router Container 2 Container 3 Linux Bridge eth0 10.18.0.1 10.18.0.2 10.18.0.12 10.18.0.4 Server eth0 Linux Bridge Container 1 Container 2 192.168.0.15 10.19.0.1 10.19.0.23 10.19.0.15
  21. 21. Across-Node Accessibility Home https://awei791129.pixnet.net/blog/post/20872246 PPPOE Server 66.88.99.45 192.168.0.1 192.168.0.12 Container Router Container 1 Router Container 2 Container 3 Linux Bridge eth0 10.18.0.1 10.18.0.2 10.18.0.12 10.18.0.4 Server eth0 Linux Bridge Container 1 Container 2 192.168.0.15 10.19.0.1 10.19.0.23 10.19.0.15
  22. 22. Across-Node Accessibility • Have to perform SNAT and DNAT for each f low direction. • Container Cluster(Kubernetes) • How does K8s solve it? • CNI (Container Network Interface) • Tunneling protocol • VXLAN, IPIP
  23. 23. Tunneling • IP over IP • Encapsulate original IP header with additional IP header • Inner IPv4 • Container to Container • Outer IPv4 • Node to Node
  24. 24. Across-Node Accessibility Home https://awei791129.pixnet.net/blog/post/20872246 PPPOE Server 66.88.99.45 192.168.0.1 192.168.0.12 Container Router Container 1 Router Container 2 Container 3 Linux Bridge eth0 10.18.0.1 10.18.0.2 10.18.0.12 10.18.0.4 Server eth0 Linux Bridge Container 1 Container 2 192.168.0.15 10.19.0.1 10.19.0.23 10.19.0.15
  25. 25. Example • NodeA (192.168.0.15) • Container A (10.19.0.15) • NodeB (192.168.0.12) • Container B (10.18.0.2) • Container A ping Container B
  26. 26. Example NodeA tunl Container Eth0 NodeB tunl Container Eth0 10.19.0.15 10.18.0.2 192.168.0.12 192.168.0.15
  27. 27. Example NodeA tunl Container Eth0 NodeB tunl Container Eth0 10.19.0.15 10.18.0.2 192.168.0.12 192.168.0.15 10.19.0.15 10.18.0.2 10.19.0.15 10.18.0.2 192.168.0.15 192.168.0.12 Data 10.19.0.15 10.18.0.2 Data There are Data now
  28. 28. AWS VPC CNI • AWS VPC CNI • AWS VPC • CNI (Container Network Interface) • Kubernetes use it to setup the network connectivity • What are Kubernetes and CNI ?
  29. 29. Kubernetes (container orchestrator) https://kubernetes.io/blog/2018/07/18/11-ways-not-to-get-hacked/
  30. 30. Kubernetes CNI Server(K8s Node) Kubelet Pod(Sandbox) CNI(binary) 1 2 3 Server(K8s Node) Kubelet Pod(Sandbox) CNI(binary) 1 2 3 • Executed by Kubelet • CNI is a standalone binary executable binary • Help to setup the network connectivity for Sandbox(Pause Container)
  31. 31. AWS VPC CNI • Goals • Support high throughput and availability, low latency • Users must be able to express and enforce network policies and isolation • Compare to native EC2 networking and security groups.
  32. 32. AWS VPC CNI • Goals • Network operation must be simple and secure. • Use VPC f low logs • Apply VPC routing polices • Pod networking should be setup in a matter of seconds
  33. 33. AWS VPC 10.2.0.0/16 Subnet A 10.2.0.0/24 EC2 Instance 10.2.0.5 EC2 Instance 10.2.0.6 EC2 Instance 10.2.0.80 Underlay Network Network Tra ff ic Network Tra ff ic Network Tra ff ic
  34. 34. AWS VPC and K8S 10.2.0.0/16 Subnet A 10.2.0.0/24 Underlay Network Pod 
 Pod 
 Pod 
 Pod 
 Pod 
 Pod 
 EC2 Instance 10.2.0.5 EC2 Instance 10.2.0.6 EC2 Instance 10.2.0.80
  35. 35. Other CNI (IP over IP) 10.2.0.0/16 Subnet A 10.2.0.0/24 Underlay Network Pod 
 Pod 
 Pod 
 Pod 
 Pod 
 Pod 
 EC2 Instance 10.2.0.5 EC2 Instance 10.2.0.6 EC2 Instance 10.2.0.80 10.56.2.5 10.56.2.15 10.56.5.5 10.56.5.48 10.56.9.5 10.56.9.25 10.56.9.0/24 10.56.5.0/24 10.56.2.0/24
  36. 36. Other CNI (IP over IP) 10.2.0.0/16 Subnet A 10.2.0.0/24 Underlay Network Pod 
 Pod 
 Pod 
 Pod 
 Pod 
 Pod 
 EC2 Instance 10.2.0.5 EC2 Instance 10.2.0.6 EC2 Instance 10.2.0.80 10.56.2.5 10.56.2.15 10.56.5.5 10.56.5.48 10.56.9.5 10.56.9.25 10.56.9.0/24 10.56.5.0/24 10.56.2.0/24 10.2.0.5 -> 10.2.0.80
  37. 37. Other CNI (IP over IP) 10.2.0.0/16 Subnet A 10.2.0.0/24 Underlay Network Pod 
 Pod 
 Pod 
 Pod 
 Pod 
 Pod 
 EC2 Instance 10.2.0.5 EC2 Instance 10.2.0.6 EC2 Instance 10.2.0.80 10.56.2.5 10.56.2.15 10.56.5.5 10.56.5.48 10.56.9.5 10.56.9.25 10.56.9.0/24 10.56.5.0/24 10.56.2.0/24 10.2.0.5 -> 10.2.0.80 Security Group ? Visibility ?
  38. 38. AWS VPC CNI 10.2.0.0/16 Subnet A 10.2.0.0/24 Underlay Network Pod 
 Pod 
 Pod 
 Pod 
 Pod 
 Pod 
 EC2 Instance 10.2.0.5 EC2 Instance 10.2.0.6 EC2 Instance 10.2.0.80 10.2.0.26 10.2.0.16 10.2.0.53 10.2.0.54 10.2.0.82 10.2.0.182 10.2.0.20 -> 10.2.0.82
  39. 39. AWS VPC CNI 10.2.0.0/16 Subnet A 10.2.0.0/24 Underlay Network Pod 
 Pod 
 Pod 
 Pod 
 Pod 
 Pod 
 EC2 Instance 10.2.0.5 EC2 Instance 10.2.0.6 EC2 Instance 10.2.0.80 10.2.0.26 10.2.0.16 10.2.0.53 10.2.0.54 10.2.0.82 10.2.0.182 10.2.0.20 -> 10.2.0.82 Security Group Visibility
  40. 40. AWS VPC CNI • Requirement • IPAM (IP addresses management) • Unique • Routing rules
  41. 41. Implementation • Currently • Each EC2 instance can have multiple elastic network interfaces (ENI) • ENI can have multiple IPv4/IPv6 addresses. • EC2-VPC Fabric will deliver the packet to the instance • The primary ENI IP address is automatically assigned to the interface • All secondary addresses remain unassigned • Host owner to con f igure them
  42. 42. Components https://github.com/aws/amazon-vpc-cni-k8s/blob/master/docs/cni-proposal.md
  43. 43. L-IPAMD • Local IP address Manager (L-IPAM) • Small and single binary on each host to maintain a warm-pool of available secondary IP addresses.
  44. 44. L-IPAMD • Maintaining the warm-pool of available secondary IP addresses • Number of IP < threshold • Create a new ENI and attach it to instance • Allocate all available IP addresses on this new ENI • Wait for the IP addresses to be ready and then add to warm-pool • Number of IP > threshold • Detach a ENI and free it and related IPs
  45. 45. CNI Plugin • Get a secondary IP address assigned to the instance by L-IPAMD • Set up the network device • Host • Pod(Sandbox) • Set up the routing rules • Host • Pod
  46. 46. AWS VPC CNI 172.31.0.0/16 Subnet A 172.31.0.0/20 Underlay Network ENI IPs L-IPAMD IP Pool VPN CNI 172.31.1.204
  47. 47. AWS VPC CNI Underlay Network ENI IPs L-IPAMD IP Pool VPN CNI Pod(Sandbox) 172.31.0.0/16 Subnet A 172.31.0.0/20 172.31.1.204
  48. 48. AWS VPC CNI Underlay Network ENI IPs L-IPAMD IP Pool VPN CNI Pod(Sandbox) veth1 veth2 172.31.0.0/16 Subnet A 172.31.0.0/20 172.31.1.204
  49. 49. AWS VPC CNI Underlay Network ENI IPs L-IPAMD IP Pool VPN CNI Pod(Sandbox) veth1 veth2 172.31.0.0/16 Subnet A 172.31.0.0/20 172.31.1.204
  50. 50. AWS VPC CNI Underlay Network ENI IPs L-IPAMD IP Pool VPN CNI Pod(Sandbox) eth0 enixxxx 172.31.15.74/32 172.31.0.0/16 169.254.1.1 Subnet A 172.31.0.0/20 172.31.1.204
  51. 51. Pod ARP/Routing MAC address of enixxxx ARP/Routing Table (Pod)
  52. 52. Pod ARP/Routing(Cont.) ARP/Routing Table (Host)
  53. 53. AWS VPC CNI Underlay Network 172.31.0.0/16 Subnet A 172.31.0.0/20 172.31.1.204 ENI IPs L-IPAMD IP Pool VPN CNI Pod(Sandbox) eth0 enixxxx 172.31.10.79/32 172.31.11.162 169.254.1.1 ENI IPs L-IPAMD IP Pool VPN CNI Pod(Sandbox) eth0 enixxxx 172.31.15.74/32 169.254.1.1 2 1 3
  54. 54. Packet f low • 172.31.15.74 ping 172.31.10.79 • Pod (172.31.15.74)
  55. 55. Packet f low • 172.31.15.74 ping 172.31.10.79 • Node (172.31.1.204)
  56. 56. Packet f low • 172.31.15.74 ping 172.31.10.79 • Node (172.31.11.162)
  57. 57. AWS VPC CNI Underlay Network 172.31.0.0/16 Subnet A 172.31.0.0/20 172.31.1.204 ENI IPs L-IPAMD IP Pool VPN CNI Pod(Sandbox) eth0 enixxxx 172.31.10.79/32 172.31.11.162 169.254.1.1 ENI IPs L-IPAMD IP Pool VPN CNI Pod(Sandbox) eth0 enixxxx 172.31.15.74/32 169.254.1.1 2 1 3 172.31.15.174 -> 172.31.10.79
  58. 58. Others • Debugging scripts
  59. 59. Others • Debugging IPAMD • Prometheus endpoint • curl http://localhost:61678/metrics • Other information (json) • curl http://localhost:61679/v1/pods • curl http://localhost:61679/v1/enis
  60. 60. Limitation • M: Number of ENI • N: Number of IP address per ENI • Ignore Primary address • M*(N-1) • T3.medium • M=3, N=6 • 3*(6-1)=15 https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-eni.html
  61. 61. Limitation • Final formula is • M*(N-1) + 2 • Two Pods are deployed before CNI • L-IPAMD • kube-proxy • Both two pods use the hostnetowk https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-eni.html
  62. 62. Limitation • Magic number 2 • Final formula is • M*(N-1) + 2 • Two Pods are deployed before CNI • L-IPAMD • kube-proxy • Both two pods use the hostnetowk https://github.com/awslabs/amazon-eks-ami/blob/master/ f iles/eni-max-pods.txt
  63. 63. Limitation 13 Running Pod 17 = testing(13) + coreDNS (2) + kube-proxy(1) + CNI (1)
  64. 64. Summary • Have to deploy two binary (L-IPAMD, CNI Binary) • L-IPAMD is deployed by K8S DaemonSet • With the help of AWS VPC CNI • Reduce the number of SNAT/DNAT • Better performance compared to Tunneling protocol • User is able to apply existing AWS VPC networking and security best practices for k8s cluster.
  65. 65. Q&A

×