Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Cilium - Fast IPv6 Container Networking with BPF and XDP

5,884 views

Published on

We present a new open source project which provides IPv6 networking for Linux Containers by generating programs for each individual container on the fly and then runs them as JITed BPF code in the kernel. By generating and compiling the code, the program is reduced to the minimally required feature set and then heavily optimised by the compiler as parameters become plain variables. The upcoming addition of the Express Data Plane (XDP) to the kernel will make this approach even more efficient as the programs will get invoked directly from the network driver.

Published in: Software
  • Be the first to comment

Cilium - Fast IPv6 Container Networking with BPF and XDP

  1. 1. Cilium: Fast IPv6 Container Networking with BPF and XDP LinuxCon 2016, Toronto Thomas Graf (@tgraf__) Kernel, Cilium & Open vSwitch Team Noiro Networks (Cisco)
  2. 2. The Cilium Experiment Scale – Addressing: IPv6? – Policy: Linear lists don’t scale. Alternative? Extensibility – Can we be as extensible as userspace networking in the kernel? Simplicity – What is an appropriate abstraction away from traditional networking? Performance – Do we sacrifice performance in the process?
  3. 3. Scaling Addressing Solution: – IPv6 addresses with host scope allocator Pros: – Everything is globally addressable – No NAT – Path to ILA for mobility of tasks Cons: – Legacy IPv4 only endpoints/applications → Optional IPv4 addressing (+ NAT) → NAT46: Provide IPv6 only applications to IPv4 only clients
  4. 4. IPv6 Status in Kubernetes/Docker ● Kubernetes (CNI): Almost there – Pods are IPv6-only capable as of k8s 1.3.6 (PR23317, PR26438, PR26439, PR26441) – Kubeproxy (services) not done yet ● Docker (libnetwork): Working on it – PR826 - “Make IPv6 Great Again” Not merged yet
  5. 5. Scaling Policy LB Frontend Backend
  6. 6. Scaling Policy LB BEFE LB FE FE BE LB LB Frontend Backend Policy: NetworkPolicy Kubernetes policy spec as discussed and standardized in the Networking SIG https://github.com/kubernetes/kubernetes/blo b/master/docs/proposals/network-policy.md
  7. 7. Scaling Policy LB QA BE QAFE QA LB Prod BE ProdFE Prod LB FE FE BE LB LB Frontend Backend QA Prod Policy:
  8. 8. Scaling Policy LB QA BE QAFE QA LB Prod BE ProdFE Prod LB FE FE QA Prod BE LB QA Prod requires requires LB Frontend Backend QA Prod Policy: Cilium extension Not yet part of Kubernetes spec QA
  9. 9. Scaling Policy Enforcement LB FE FE QA Prod BE LB QA Prod requires requires LB QA FE QA LB Prod10 11 12 13 Policy enforcement cost becomes a single hashtable lookup regardless of number of containers or policy complexity. BE QA FE Prod 14 BE Prod 15 Distributed Label ID Table:Policy: QA This ID is carried in packet as metadata to provide security context at destination host
  10. 10. Extensibility
  11. 11. Kernel Userspace Source Code Byte Code LLVM/clang Sockets netdevice Network StackTC Ingress TC Egress netdevice Verifier + JIT add eax,edx shl eax,2 add eax,edx shl eax,2 BPF – Berkley Packet Filter
  12. 12. Kernel Userspace BPF Program Userspace Process BPF Maps & Perf Ring Buffer BPF Map Hashtable BPF Map Array Userspace Process BPF Program Per Ring Buffer Data DataTail Call
  13. 13. BPF Features (As of Aug 2016) ● Efficient data sharing via maps – Per-CPU/global arrays & hashtables ● Rewrite packet content ● Extend/trim packet size ● Redirect to other net_device ● Attachment of tunnel metadata ● Cgroups integration ● Access to high performance perf ring buffer ● …
  14. 14. Kernel Userspace XDP – Express Data Path Source Code Byte Code LLVM/clang Sockets Netdevice Network Stack Verifier + JIT add eax,edx shl eax,2 Driver Access to DMA buffer
  15. 15. Kernel Cilium Layer Orchestration systems eth0 BPF Program Cilium Daemon Cilium Monitor Cilium CLI BPF Program Conntrack Policy Bytecode injection Events BPF Program Conntrack Policy Code Generation Plugins Policy Repository Cilium Architecture
  16. 16. Why is this awesome? On the fly BPF program generation means: ● Extensibility of userspace networking in the kernel ● MAC, IP, port number, … all become constants → compiler can optimize heavily! ● BPF programs can be recompiled and replaced without interrupting the container and its connections – Features can be compiled in/out at runtime with container granularity ● Access to fast BPF maps and perf ring buffer to interact with userspace. – Drop monitor in n*Mpps context – Use notifications for policy learning, IDS, logging, ...
  17. 17. Available Building Blocks ● L3 forwarding (IPv6 & IPV4) ● Host connectivity ● Encapsulation (VXLAN/Geneve/GRE) ● ICMPv6 generation ● NDisc & ARP responder ● Access Control Currently working on: ● Fragmentation handling ● Mobility ● Port Mapping (TCP/UDP) ● Connection tracking ● L3/L4 Load Balancer ● Statistics ● Events (perf ring buffer) ● Debugging framework ● NAT46 ● End to end encryption
  18. 18. Networking should be invisible, it is not. Simplicity
  19. 19. Simplicity ● L3 only (Calico gets this right) – No L2 scaling issues, no broadcast domains, no L2 vulnerabilities ● No “Networks” – No need for containers to join multiple networks to access multiple isolation domains. No need for multiple addresses. ● Policy definition independent of addressing – As specified in Kubernetes Networking SIG – All policies based on container labels
  20. 20. Performance 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 0 100 200 300 400 500 600 Container to container on local node # Cores Gbit netperf -t TCP_SENDFILE -H beef::aa0:18:ee5e 1 TCP flow per core, 10’000 policies Intel Xeon 3.5Ghz Sandy Bridge, 24 cores
  21. 21. Performance 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 Container to container over 10GiB NICs 64 128 256 512 1024 64000 # Cores MBit netperf -t TCP_SENDFILE -H beef::aa0:18:ee5e 1 TCP flow per core, 10’000 policies Intel Xeon 3.5Ghz Sandy Bridge, 24 cores
  22. 22. <Insert Cool Demo Here>
  23. 23. Q&A Image Sources: ● Cover (Toronto) Rick Harris (https://www.flickr.com/photos/rickharris/) ● The Invisible Man Dr. Azzacov (https://www.flickr.com/photos/drazzacov/) Start hacking with BPF for containers: http://github.com/cilium/cilium Contact: Slack: cilium.slack.com Twitter: @tgraf__ Mail: tgraf@tgraf.ch Team: ● André Martins ● Daniel Borkmann ● Madhu Challa ● Thomas Graf

×