1. Taylor Brown
Principal Program Manager
@taylorb_msft
Dinesh Govindasamy
Principal Engineering Lead
@dingovcloud
Beyond “”
the Path to Windows and Linux Parity in Docker
2. Docker AND Windows
This is not…
• Docker for Windows (it is but we’ll get to that)
• Linux on Windows (again it is but we’ll get to that too)
• Ubuntu on Windows or BASH on Windows (really this one it’s not, sort of
)
This is…
• Docker Engine compiled for Windows calling Windows APIs
• Available on Windows 10 and Windows Server 2016 today
3. High Level Architecture In Linux
containerd + runc
REST Interface
libcontainerd graphlibnetwork plugins
Control Groups
cgroups
Namespaces
Pid, net, ipc, mnt, uts
Layer Capabilities
Union Filesystems
AUFS, btrfs, vfs, zfs*,
DeviceMapper
Other OS
Functionality
Docker Client
Docker
Registry
Docker
Compose
Docker Swarm
4. High Level Architecture In
Windows
‘containerd’ + runc
REST Interface
libcontainerd graphlibnetwork plugins
Control Groups
Job Objects
Namespaces
Object Namespace,
Process Table,
Networking
Layer Capabilities
Registry, Union like
filesystem extensions
Other OS
Functionality
Docker Client
Docker
Registry
Docker
Compose
Docker Swarm
Compute Service
5. Compute Service
• Public interface to containers
• Currently replaces containerd on Windows
• Manages running containers
• Abstracts low-level capabilities
• Language bindings available
• Go: https://github.com/Microsoft/
“hcsshim” (as in the shim between Docker and the Host Compute
Service)
• C#: https://github.com/Microsoft/
dotnet-computevirtualization (because .net stuff needs long names)
10. Windows Containers
App
Host User Mode
Container
Runtime
Hyper-V Isolation
Virtual Machine
Optimized for Container
App
Hyper-V Isolation
Virtual Machine
Optimized for Container
App
11. Namespaces
Silo: extension to Windows Job object (aka cgroup)
• Set of processes
• Resource controls
• New: set of namespaces
New namespace virtualization
• Registry
• Process IDs, sessions
• Object namespace
• File system
• Network compartments
13. Container Networking Basics
Linux Windows
• Network Namespace • Network Compartments
• Linux Bridge and IP Routing • VSwitch
• IP Links • Vnics and Switch Ports
• IP Tables • Firewall & VFP Rules
17. MacVLAN Vs Transparent
Host
veth
adminweb-dogweb-cat
eth0
eth0.10 eth0.20 eth0.30
macvlan10 macvlan20 macvlan30
L2 physical network
VLAN 20: 192.168.20.1VLAN 30: 192.168.30.1
802.1Q Trunk
VLAN 10: 192.168.10.1
Linux Windows
Host
veth
adminweb-dogweb-cat
eth0.20
L2 physical network
VLAN 20: 192.168.20.1VLAN 30: 192.168.30.1
802.1Q Trunk
VLAN 10: 192.168.10.1
Host
vNIC
VSwitch
External
NIC
VNIC - 10 VNIC - 20 VNIC - 30
Host
veth
adminweb-dogweb-cat
eth0.20
L2 physical network
VLAN 20: 192.168.20.1VLAN 30: 192.168.30.1
802.1Q Trunk
VLAN 10: 192.168.10.1
Host
vNIC
VSwitch
External
NIC
VNIC - 10 VNIC - 20 VNIC - 30
Transparent L2 Bridge / L2 Tunnel
Physical Network
learns the Container
MAC
Container MAC is re-
written to the Container
Host NIC MAC
VM: MAC Spoofing
must to be enabled
More Suitable for Cloud
Environments
L2 Bridge L2 Tunnel
Container to Container
traffic Bridged inside
the container host
Tunneled to External
router or L1 Fabric host
SDN policies cannot be
applied to containers
within the Host
More Suitable for
Extending SDN policies
to Containers
21. NAT Overlay Transparent
L2 Bridge /
L2 Tunnel
Multi Host
Connectivity
No Native Support Yes No native Support No native Support
Service
Discovery
Only on
local host network
Across Cluster
Bring your Own or
Host DNS
Bring your Own or
Host DNS
Load
Balancing
Internal Local DNS-
Based
Internal global DNS
Based
Publish Host mode
No Native Support No Native Support
IP Addressing
Internal addressing per
container
(scoped per NAT)
Internal addressing per
container
(scoped per overlay)
External addressing per
container
(physical network)
External addressing
per container
(physical network)
Requirements Engine 1.7+
Engine 1.13+, Cluster
Swarm mode,
KB4015217
Engine 1.7+
Windows Server
Enable MAC Spoofing for
VM – Host Interface
Engine 1.7+
Windows Server
Network Deployment Modes
For the past year we have been working extensively on windows platform for supporting docker networking specifically enabling docker swarm on windows. This would not have been possible without the support of Madhu's team in docker. We are happy to announce that overlay network mode is available in windows server 2016 as of last Tuesday windows update. There should be an announcement coming soon. This is a great testament to the amazing partnership, we have with docker.
In this session, we are going to cover some basics, deep dive of different networking modes in windows and how they compare with Linux and a cool demo of docker swarm in windows and Linux.
Let's look at the Linux networking building blocks that docker networking architecture is built upon and how they compare with Linux and how we have developed windows networking drivers.
Linux network namespace. In windows namespace is equal to the network compartments. Conceptually compartments are logical container in TCP/IP stack. Network layer in TCP/IP ensure that each compartment is isolated and packet forwarding between compartments is prevented. All ip objects such as interfaces ip addresses routes prefixes live inside one and only compartment.
Layer 2 switching functionality is provided by Linux bridge. In windows VSwitch provides layer 2 functionality and layer 3 routing services. You can have multiple instances of VSwitch. Switch Ports can be dynamically added and deleted to each VSwitch. Each instance of VSwitch has its own forwarding table and forwards packets based on MAC address and vlan tagging of the packets.
Veth. In windows, container network interfaces (host vNIC or VMNIC) are added to each compartment and then bound to the corresponding switch port in the VSwitch.
Ip tables in Linux provide rich packet filtering. In windows, we use VFP virtual filtering platform. VFP is a programmable match action based filtering engine. VFP offers a rich data plane primitives that you can apply actions on packets such as encap decap state full NAT acl metering etc.
As you all know docker networking architecture is built upon the set of interfaces called as container networking model. For windows, too all the constructs and docker CLI options for networking remain the same as Linux.
Windows network driver call a new abstraction layer called as host network service which is responsible for setting up the container networking in windows.
Now let's look at the different network modes we have in windows and how they compare against windows
The default network mode in Linux is bridge mode and the corresponding default mode in windows is NAT mode.
For NAT mode, we create an internal VSwitch which is a private VSwitch with an addition of gateway Nic that enables connectivity to the host partition. We also create a NAT between the gateway Nic and the external nic. So, containers within the NAT network gets switched in the VSwitch and the traffic to internet gets NATed to the container host ip.
If you want to configure your container to use underlay network, then you would be using MacVLAN driver mode in Linux. In windows, we have 3 different network modes that enables you to use underlay network. For all these network modes, we create an external VSwitch. An external VSwitch enables your containers to connect to both host partition and physical network.
In transparent network mode, we let container MAC address pass though the VSwitch and let physical network learn the container macs. You need to enable Mac spoofing on the network interface if you are running transparent network mode on a virtual machine.
In case of L2 bridge mode we rewrite the container Mac with the container host Mac. This helps in not flooding the physical network with all those containers. Both l2 bridge and l2 tunnel modes are more suited for cloud environments.
In l2 bridge mode the container to container traffic is bridged within the container host whereas in l2 tunnel mode the traffic is tunneled to the external router in azure case in the l1 fabric host and then hair pinned back to the destination container. This mode enables you to apply SDN policies on the host for containers.
Let's look at the internal architecture of overlay network in windows. In Linux two bridges are created one for ovnet and the other for the traffic outside of the cluster. In windows, too we create two vSwitches. One is external switch bound to the external Nic with vfp enabled which does the encap and decap. And the other is a NAT network. In both Linux and windows 2 interfaces are added to the container one connected to the overlay and the other connected to the NAT.
Docker Engine has an internal DNS server that provides name resolution to all the containers on the host in NAT and Overlay network modes. Its little differently implemented that Linux. In Windows, we use the Gateway IP as the DNS server in each container and Docker engine on the host runs DNS server on the gateway NIC. When a DNS query comes up, Docker Engine then checks if the DNS query belongs to a container or service on network(s) that the requesting container belongs to. If it does, then Docker Engine looks up the IP address that matches a container, task, or service's name in its key-value store and returns that IP back to the requester.
Service discovery is network-scoped, Containers not on the same network cannot resolve each other's addresses.
Publishing Ports
Docker Supports two ways of publishing service ports outside of the swarm. One is using routing mesh and the other is using publish mode host where we can publish the service port directly from the host. We don’t yet support routing mesh in windows, but we do support publishing port using the host mode. You can use external load balancer and load balance across your tasks in ur service, which is what we will demo here too…
Deployment Modes
In this slide, we are going to look at different network modes we have in windows and how they differ at each other with respect physical network design, configuration and how they interoperate with application.
Multi Host Connectivity, NAT doesn’t provide any native support. Overlay supports multi host connectivity. For Transparent and L2 modes we expect the underlay to provide routing for multi host connectivity.
Service Discovery, we use Docker Embedded DNS server for NAT and Overlay modes. For other modes, we expect DNS to be hosted externally.
Load Balancing, DNSRR is currently the only supported mode of load balancing in Windows for NAT and overlay mode. For other modes, we done have any native support
IP Addressing Both NAT and Overlay has internal addressing scoped to the network. For transparent and L2 modes we support external public facing IP assignment to the containers.
Requirements. You need the listed KB for Overlay network mode. For transparent network mode, if you are using a VM then you need to make sure MAC spoofing is enabled on the network interface of the VM.