Advertisement

Deeper Dive in Docker Overlay Networks

Docker, Inc.
Oct. 20, 2017
Advertisement

More Related Content

Advertisement

More from Docker, Inc.(20)

Advertisement

Deeper Dive in Docker Overlay Networks

  1. Deeper Dive in Docker Overlay Networks Laurent Bernaille @lbernail CTO D2SI
  2. Agenda Reminder on the Docker Overlay VXLAN Control Plane options Using BGP as a dynamic Control Plane What can we do with this?
  3. Reminder on the Docker overlay
  4. The Docker Overlay network docker0:~$ docker network create --driver overlay --subnet 192.168.0.0/24 dockercon d099dcc709daddbc0e143c24e7091bef6b13bdc3abb379473af4582bf1e112b1 docker1:~$ docker network ls NETWORK ID NAME DRIVER SCOPE d099dcc709da dockercon overlay global docker0:~$ docker run -d --ip 192.168.0.100 --net dockercon --name C0 debian sleep infinity docker1:~$ docker run -it --rm --net dockercon debian root@950d67e96db7:/# ping 192.168.0.100 PING 192.168.0.100 (192.168.0.100): 56 data bytes 64 bytes from 192.168.0.100: seq=0 ttl=64 time=1.153 ms
  5. Docker Overlay: Data plane docker0 eth0 192.168.0.100 C0 Namespace br0 vxlanveth eth0 docker1 C1 Namespace br0 vxlanveth eth0PING eth0 192.168.0.Y 10.0.0.10 10.0.0.11 IP src: 10.0.0.11 dst: 10.0.0.10 UDP src: X dst: 4789 VXLAN VNI Original L2 src: 192.168.0.Y dst: 192.168.0.100
  6. What is VXLAN? • Tunneling technology over UDP (L2 in UDP) • Developed for cloud SDN to create multi-tenancy • Without the need for L2 connectivity • Without the normal VLAN limit (4096 VLAN Ids) • Easy to encrypt: IPsec • Overhead: 50 bytes • In Linux • Started with Open vSwitch • Native with Kernel >= 3.7 and >=3.16 for Namespace support Outer IP packet UDP dst: 4789 VXLAN Header Original L2 VXLAN: Virtual eXtensible LAN VNI: VXLAN Network Identifier VTEP: VXLAN Tunnel Endpoint
  7. docker0 docker1 10.0.0.0/16 10.0.0.10 10.0.1.10 Let's build an overlay "manually"
  8. Overlay namespaces docker0 br42 vxlan42 eth0 docker1 br42 eth0 10.0.0.10 10.0.1.10 vxlan42
  9. Creating the overlay namespaceip netns add overns ip netns exec overns ip link add dev br42 type bridge ip netns exec overns ip addr add dev br42 192.168.0.1/24 ip link add dev vxlan42 type vxlan id 42 proxy dstport 4789 ip link set vxlan1 netns overns ip netns exec overns ip link set vxlan42 master br42 ip netns exec overns ip link set vxlan42 up ip netns exec overns ip link set br42 up create overlay NS create bridge in NS create VXLAN interface move it to NS add it to bridge bring all interfaces up setup_vxlan script
  10. docker0 C0 Namespace br42 veth eth0 docker1 C1 Namespace br42 veth eth0 eth0 192.168.0.10 eth0 192.168.0.20 10.0.0.10 10.0.1.10 vxlan42 vxlan42 Attach containers
  11. docker0 docker run -d --net=none --name=demo debian sleep infinity ctn_ns_path=$(docker inspect --format="{{ .NetworkSettings.SandboxKey}}" demo) ctn_ns=${ctn_ns_path##*/} ip link add dev veth1 mtu 1450 type veth peer name veth2 mtu 1450 ip link set dev veth1 netns overns ip netns exec overns ip link set veth1 master br42 ip netns exec overns ip link set veth1 up ip link set dev veth2 netns $ctn_ns ip netns exec $ctn_ns ip link set dev veth2 name eth0 address 02:42:c0:a8:00:10 ip netns exec $ctn_ns ip addr add dev eth0 192.168.0.10 ip netns exec $ctn_ns ip link set dev eth0 up docker1 Same with 192.168.0.20 / 02:42:c0:a8:00:20 Create container without net Create veth Send veth1 to overlay NS Attach it to overlay bridge Send veth2 to container Rename & Configure Get NS for container Create containers and attach them plumb script
  12. Does it ping? docker0:~$ docker exec -it demo ping 192.168.0.20 PING 192.168.0.20 (192.168.0.20): 56 data bytes 92 bytes from 192.168.0.10: Destination Host Unreachable docker0:~$ sudo ip netns exec overns ip neighbor show docker0:~$ sudo ip netns exec overns ip neighbor add 192.168.0.20 lladdr 02:42:c0:a8:00:20 dev vxlan42 docker0:~$ sudo ip netns exec overns bridge fdb add 02:42:c0:a8:00:20 dev vxlan42 self dst 10.0.1.10 vni 42 port 4789 docker1: Same with 192.168.0.10, 02:42:c0:a8:00:10 and 10.0.0.10
  13. docker0 C0 Namespace br42 veth eth0 docker1 C1 Namespace br42 veth eth0 eth0 192.168.0.20 eth0 192.168.0.20 10.0.0.10 10.0.1.10 vxlan42 vxlan42 PING FDB ARP FDB ARP Result
  14. VXLAN Control Plane options
  15. vxlan vxlan vxlan Multicast 239.x.x.x ARP: Who has 192.168.0.2? L2 discovery: where is 02:42:c0:a8:00:02 ? Use a multicast group to send traffic for unknown L3/L2 addresses PROS: simple and efficient CONS: Multicast connectivity not always available (on public clouds for instance) VXLAN Control Plane options - 1: Multicast
  16. Configure a remote IP address where to send traffic for unknown addresses PROS: simple, not need for multicast, very good for two hosts CONS: difficult to manage with more than 2 hosts VXLAN Control Plane options - 2: Point-to- point vxlan vxlan Remote IP: point-to-point Send everything to remote IP
  17. Do nothing, provide ARP / FDB information from outside PROS: very flexible CONS: requires a daemon and a centralized database of addresses VXLAN Control Plane options - 3: User-Land vxlan vxlan daemon daemon Manual (with a daemon modifying ARP/FDB) ARP: Do you know 192.168.0.2? L2: where is 02:42:c0:a8:00:02 ? vxlan daemon
  18. consul/swarm docker0 eth0 192.168.0.100 C0 Namespace br0 vxlan veth eth0 docker1 C1 Namespace br0 vxlan veth eth0 192.168.0.Y eth0 NAT PING dockerd dockerd 10.0.0.10 10.0.1.10 ARP FDB ARP FDB IP src: 10.0.0.11 dst: 10.0.0.10 UDP src: X dst: 4789 VXLAN VNI Original L2 src: 192.168.0.Y dst: 192.168.0.100 Serf / Gossip Docker Overlay control plane (3: User-land)
  19. "Deep Dive in Docker Overlay Networks", Dockercon Austin 2017 Slides Video Blog Posts That was a lot of information
  20. Using BGP as a dynamic control plane
  21. Rely on BGP eVPN address family to distribute L2 and L3 data PROS: BGP is a standard to distribute addresses, supported by SDN vendors CONS: limited Linux implementations, requires some BGP knowledge VXLAN Control Plane- Option 4: BGP-EVPN vxlan vxlan bgpd bgpd vxlan bgpd Endpoint data is distributed with BGP
  22. BGP in one slide ● Routing Protocol between network entities ("Autonomous Systems", AS) Google ASN: 15169 / Amazon ASN: 16509 (both actually have more than one) ● BGP is an EGP: Exterior Gateway Protocol IGP: Interior Gateway Protocol (OSPF, EIGRP, IS-IS) IGP: next hop is the IP of a router BGP: next hop is an Autonomous System ● BGP is what makes Internet work ● BGP scales very well 500 000+ prefixes for a full Internet table
  23. A quick BGP example AS 1 AS 2 AS 3 AS 5 AS 4 eBGP iBGP 20.0.0.0/16 20.0.0.0/16: AS1 20.0.0.0/16: AS1 20.0.0.0/16: AS4-AS1 Shortest PATH? 20.0.0.0/16: AS5-AS4-AS1 20.0.0.0/16: AS2-AS1 AS: Autonomous System eBGP: external (different AS) iBGP: internal (same AS)
  24. iBGP iBGP requires to mesh between all peers n peers => n * (n-1) / 2 connections 50 peers => 1225 (49 of each host) Route-reflectors simulate the mesh More scalable and simpler Possible to have more than one RR RR Distribute BGP information within an Autonomous System
  25. BGP EVPN ● Part of MP-BGP (multi-protocol BGP: not only IP prefixes) ● Announce VXLAN information instead of IP prefixes L3: IP addresses of VXLAN endpoints (VTEP) L2: Location of MAC addresses ● BUM (Broadcast, Unknown, Multicast) traffic unicasted to all VTEPs ● Get the scalability of BGP
  26. 10.0.0.0/16 docker0: 10.0.0.10 Environment RR1 RR2 quagga- rr quagga- rr 10.0.0.5 10.0.1.5 docker1: 10.0.1.10
  27. docker0:~$ docker run -t -d --privileged --name quagga -p 179:179 --hostname docker0 -v $(pwd)/quagga:/etc/quagga cumulusnetworks/quagga (modify routing/forwarding) router bgp 65000 bgp router-id 10.0.0.10 no bgp default ipv4-unicast neighbor reflectors peer-group neighbor reflectors remote-as 65000 neighbor reflectors capability extended-nexthop neighbor 10.0.0.5 peer-group reflectors neighbor 10.0.1.5 peer-group reflectors address-family evpn neighbor reflectors activate advertise-all-vni BGP configuration on Docker0 router bgp 65000 bgp router-id 10.0.0.5 bgp cluster-id 111.111.111.111 no bgp default ipv4-unicast neighbor docker peer-group neighbor docker remote-as 65000 bgp listen range 10.0.0.0/16 peer-group docker address-family evpn neighbor docker activate neighbor docker route-reflector-client BGP configuration on Route Reflectors Creating our BGP clients on Docker hosts
  28. 10.0.0.0/16 docker0: 10.0.0.10 What we have so far RR1 RR2 quagga- rr quagga- rr docker0 quagga eth0 10.0.0.5 10.0.1.5 docker1: 10.0.1.10 docker0 quaggaeth0
  29. Let's look at the BGP data docker0:~$ docker exec -it quagga vtysh docker0# show run docker0# show bgp neighbors docker0# show bgp evpn summary BGP router identifier 10.0.0.10, local AS number 65000 vrf-id 0 Peers 2, using 42 KiB of memory Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd quagga0(10.0.0.5) 4 65000 42 43 0 0 0 00:02:01 0 quagga1(10.0.1.5) 4 65000 42 43 0 0 0 00:02:01 0 docker0# show bgp evpn route No EVPN prefixes exist
  30. Configuring VXLAN interfaces sudo ./setup_vxlan 42 container:quagga dstport 4789 nolearning <= Only learn through EVPN 10.0.0.0/16 docker0: 10.0.0.10 RR1 RR2 quagga- rr quagga- rr docker0 br42 vxlan42 quagga eth0 10.0.0.5 10.0.1.5 docker1: 10.0.1.10 docker0 br42vxlan42 quaggaeth0
  31. Let's look at the BGP data docker0:~$ docker exec -it quagga vtysh docker0# show bgp evpn route BGP table version is 0, local router ID is 10.0.0.10 EVPN type-2 prefix: [2]:[ESI]:[EthTag]:[MAClen]:[MAC] EVPN type-3 prefix: [3]:[EthTag]:[IPlen]:[OrigIP] Network Next Hop Metric LocPrf Weight Path Route Distinguisher: 10.0.0.10:1 *> [3]:[0]:[32]:[10.0.0.10] 10.0.0.10 32768 i Route Distinguisher: 10.0.1.10:1 *>i[3]:[0]:[32]:[10.0.1.10] 10.0.1.10 0 100 0 i docker0# show evpn mac vni all
  32. Let's add containers and try pinging 10.0.0.0/16 docker0: 10.0.0.10 RR1 RR2 quagga- rr quagga- rr docker0 br42 vxlan42 quagga demo: 192.168.0.10 eth0 eth0 10.0.0.5 10.0.1.5 docker1: 10.0.1.10 docker0 br42vxlan42 quagga demo: 192.168.0.20 eth0 eth0 docker0:~$ sudo ./plumb br42@quagga demo 192.168.0.10/24@192.168.0.1 02:42:c0:a8:00:10 docker1:~$ sudo ./plumb br42@quagga demo 192.168.0.20/24@192.168.0.1 02:42:c0:a8:00:20
  33. What about BGP? docker0:~$ docker exec -it quagga vtysh docker0# show bgp evpn route BGP table version is 0, local router ID is 10.0.0.10 Status codes: s suppressed, d damped, h history, * valid, > best, i - internal Origin codes: i - IGP, e - EGP, ? - incomplete EVPN type-2 prefix: [2]:[ESI]:[EthTag]:[MAClen]:[MAC] EVPN type-3 prefix: [3]:[EthTag]:[IPlen]:[OrigIP] Route Distinguisher: 10.0.1.10:1 *>i[2]:[0]:[0]:[48]:[02:42:c0:a8:00:20] 10.0.1.10 0 100 0 i * i[3]:[0]:[32]:[10.0.1.10] 10.0.1.10 0 100 0 i docker0# show evpn mac vni all VNI 42 #MACs (local and remote) 2 MAC Type Intf/Remote VTEP VLAN 02:42:c0:a8:00:10 local veth0pldemo 02:42:c0:a8:00:20 remote 10.0.1.10
  34. 10.0.0.0/16 docker0: 10.0.0.10 Overview RR1 RR2 quagga- rr quagga- rr docker0 br42 vxlan42 quagga demo: 192.168.0.10 eth0 eth0 10.0.0.5 10.0.1.5 docker1: 10.0.1.10 docker0 br42vxlan42 quagga demo: 192.168.0.20 eth0 eth0Control plane Data plane
  35. ● Standard VXLAN address distribution (used on many routers) ● Full management of BUM traffic ARP queries Broadcasts (DHCP) Multicast (Discovery, keepalived) ● BUM traffic is unicasted (not efficient) Possible optimizations: ARP suppression (Cumulus Quagga) What's interesting about this setup?
  36. What can we do with this?
  37. What if we want a second Overlay? 10.0.0.0/16 docker0: 10.0.0.10 RR1 RR2 quagga- rr quagga- rr docker0 br42 vxlan42 quagga demo 192.168.0.10 eth0 eth0 10.0.0.5 10.0.1.5 docker1: 10.0.1.10 br66 vxlan66 docker0 br42vxlan42 quagga demo 192.168.0.10 eth0 eth0 br66vxlan66 demo66 192.168.66.10 eth0 demo66 192.168.66.20 eth0 docker0:~$ sudo ./setup_vxlan 66 container:quagga dstport 4789 nolearning docker0:~$ docker run -d --net=none --name=demo66 debian sleep infinity docker0:~$ sudo ./plumb br66@quagga demo66 192.168.66.10/24 02:42:c0:a8:66:10
  38. What about BGP? docker0:~$ docker exec -it quagga vtysh docker0# show evpn vni Number of VNIs: 2 VNI VxLAN IF VTEP IP # MACs # ARPs # Remote VTEPs 42 vxlan42 0.0.0.0 2 0 1 66 vxlan66 0.0.0.0 2 0 1 docker0# show evpn mac vni all VNI 42 #MACs (local and remote) 2 MAC Type Intf/Remote VTEP VLAN 02:42:c0:a8:00:10 local veth0pldemo 02:42:c0:a8:00:20 remote 10.0.1.10 VNI 66 #MACs (local and remote) 2 MAC Type Intf/Remote VTEP VLAN 02:42:c0:a8:66:10 local veth0pldemo66 02:42:c0:a8:66:20 remote 10.0.1.10
  39. 10.0.0.0/16 docker0: 10.0.0.10 RR1 RR2 quagga- rr quagga- rr docker0 br42 vxlan42 quagga demo 192.168.0.10 eth0 eth0 10.0.0.5 10.0.1.5 docker1: 10.0.1.10 docker0 br42vxlan42 quaggaeth0 Taking advantage of broadcast: DHCP dhcp 192.168.0.254 eth0 demo 192.168.0.20 eth0 demodhcp 192.168.0.10? eth0
  40. Configuring DHCP docker0:~$ docker run -d --net=none --name dhcp -v "$(pwd)/dhcp":/data networkboot/dhcpd eth0 docker0:~$ sudo ./plumb br42@quagga dhcp 192.168.0.254/24 docker1:~$ docker run -d --net=none --name=demodhcp debian sleep infinity docker1:~$ sudo ./plumb br42@quagga demodhcp dhcp docker1:~$ docker exec -it demodhcp ping 192.168.0.10 PING 192.168.0.10 (192.168.0.10): 56 data bytes 64 bytes from 192.168.0.10: icmp_seq=0 ttl=47 time=1.566 ms subnet 192.168.0.0 netmask 255.255.255.0 { range 192.168.0.100 192.168.0.200; option routers 192.168.0.1; option domain-name-servers 8.8.8.8; } DHCP configuration
  41. 10.0.0.0/16 RR1 RR2 quagga- rr quagga- rr docker0 br42 vxlan42 quagga demo 192.168.0.10 eth0 eth0 10.0.0.5 10.0.1.5 docker0 br42vxlan42 quaggaeth0 Getting out of our Docker environment dhcp 192.168.0.254 eth0 demo 192.168.0.20 eth0 client 192.168.0.100 eth0 quagga br42 vxlan42 vethgw 192.168.0.1 docker0: 10.0.0.10 docker1: 10.0.1.10gateway0: 10.0.0.20
  42. Getting out of our Docker environment gateway0:~$ ./setup_vxlan 42 host dstport 4789 nolearning gateway0:~$ ip link add dev vethbr type veth peer name vethgw gateway0:~$ ip link set vethbr master br42 gateway0:~$ ip addr add 192.168.0.1/24 dev vethgw gateway0:~$ ping 192.168.0.10 PING 192.168.0.10 (192.168.0.10): 56 data bytes 64 bytes from 192.168.0.10: icmp_seq=0 ttl=47 time=0.866 ms br42 vethgw 192.168.0.1 vxlan42 vethbr
  43. 10.0.0.0/16 RR1 RR2 quagga- rr quagga- rr docker0 br42 vxlan42 quagga demo 192.168.0.10 eth0 eth0 10.0.0.5 10.0.1.5 docker0 br42vxlan42 quaggaeth0 Getting out of VXLAN / Quagga dhcp 192.168.0.254 eth0 demo 192.168.0.20 eth0 client 192.168.0.100 eth0 quagga br42 vxlan42 vethgw 192.168.0.1 eth0 Non-VXLAN host 10.0.0.30 route 10.0.0.0/16 ó 192.168.0.0/24 NAT docker0: 10.0.0.10 docker1: 10.0.1.10gateway0: 10.0.0.20
  44. Getting out of VXLAN / Quagga gateway0:~$ echo 1 | sudo tee /proc/sys/net/ipv4/ip_forward gateway0:~$ iptables -t nat -A POSTROUTING ! -d 10.0.0.0/16 -s 192.168.0.0/24 -o eth0 -j MASQUERADE docker1:~$ docker exec -it demodhcp ping 192.168.0.1 <= Local (VXLAN) docker1:~$ docker exec -it demodhcp ping 10.0.0.30 <= Routed docker1:~$ docker exec -it demodhcp ping 8.8.8.8 <= NATed simple1:~$ ping 192.168.0.1 simple1:~$ ping 192.168.0.10 eth0 routeNAT
  45. 10.0.0.0/16 docker0: 10.0.0.10 RR1 RR2 quagga- rr quagga- rr docker0 br42 vxlan42 quagga demo 192.168.0.10 eth0 eth0 10.0.0.5 10.0.1.5 docker1: 10.0.1.10 docker0 br42vxlan42 quaggaeth0 Another nice thing we can do dhcp 192.168.0.254 eth0 demo 192.168.0.20 eth0 demodhcp 192.168.0.100 eth0 gateway0: 10.0.0.20 quagga br42 vxlan42 vethgw 192.168.0.1 eth0 Non-VXLAN host 10.0.0.30 routeNAT QEMU, dhclient 192.168.0.10x tap0
  46. What could a real-life setup look like? RR2 Docker quagga Docker quagga Docker quagga Docker quagga Docker quagga Docker quagga Docker quagga Docker quagga BGP/EVPN Router Standard h ost Standard h ost Standard h ost Standard h ost VXLAN Routing Routes from non-VXLAN infraRoutes to VXLAN networks RR1
  47. How does it compare to other solutions? Data plane Control Plane Swarm Classic VXLAN External KV Store (Consul / Etcd) SwarmKit VXLAN Swarmkit (Raft / Gossip implementation) Flannel host-gw Routing Etcd / Kubernetes API Flannel VXLAN VXLAN Etcd / Kubernetes API Calico Routing / IPIP Etcd / BGP (IP prefixes) Weave Classic Custom Custom Weave Fast Datapath VXLAN Custom Contiv VXLAN, Routing, L2 Etcd / BGP (IP and maybe eVPN) Disclaimer: almost no experience with any (from documentation and discussions mostly)
  48. Perspectives ● FFRouting Quagga fork Cumulus has switched to FFRouting and merged EVPN support ● Open vSwitch Alternative to linux native bridge and VXLAN (Possibly) better performances and more features Not sure how Quagga/FFRouting would integrate with Open vSwitch ● Performances Measure impact of VXLAN Test VXLAN acceleration when available on NICs ● CNI plugin (to test on Kubernetes and mostly for learning
  49. Thank you! Questions? https://github.com/lbernail/dockerco n2017 @lbernail
Advertisement