Cilium
Networking & Security for Containers with BPF & XDP
Docker Distributed Systems Summit
Thomas Graf
The Network becomes the Application bus
We have to deal with networks that ...
○ contain millions of endpoints
○ are noisy (nMpps)
○ are insecure with multiple tenants
○ operate unreliably
○ are constantly evolving WRT protocols
Cilium Architecture
What is BPF?
BPF Code Generation at Container Startup
● Generate networking code at container startup
○ Tailored to each individual container
○ Leads to minimal code required
⇒ faster
⇒ smaller attack surface (unikernel like)
● Majority of configuration (IP, MAC, ports, ... ) becomes
constant, the compiler can optimize heavily
● Regeneration at runtime without breaking connections
Make all tasks globally addressable on the
Internet
● Global IPv6 addresses
○ No NAT!
○ Native IPv4/NAT46 + NAT for compat
● Host scope address allocator
○ Lockless allocation
● Task mobility
○ ILA
Scaling Policy Specification
● How to specify policy for millions of endpoints?
● Decouple policy specification from addressing
○ IP+port ACLs are unsuitable for containers
○ Policy specification based on container labels
Frontend BackendLB
FE BE
LB
LB
FE
FE BE
LB
Scaling Policy Specification
● How to specify policy for millions of endpoints?
● Decouple policy specification from addressing
○ IP+port ACLs are unsuitable for containers
○ Policy specification based on container labels
Frontend BackendLB
FE BE
LB
LB
FE
FE BE
LB
Prod
Frontend BackendLB
FE BELB
QA
Prod
QA
Prodrequires
requires QA
QA
Scaling Policy Enforcement
● Distributed fixed cost policy enforcement
○ Per-CPU BPF-map hashtable
FE
BE
LB Prod
QA
Prod
Prod
FE
BE
LB
QA
QA
10
11
12
13
14
15
16
Cluster Wide Label ID Table: This ID is carried in the network packet and used
to reconstruct the label context at the receiving
host.
Policy enforcement cost is reduced to a
single hashtable lookup regardless of
complexity.
Extensibility & Safety in the Kernel
● Decouple datapath functionality from kernel version
○ Support new protocols
○ Add arbitrary statistics
○ Safety guaranteed by Verifier
● All at runtime for already running containers
Scaling the Delivery of Cat Pictures
● Distributed L3/L4 LB w/ DSR
● Like IPVS but completely programmable
● LB for N-S, E-W & Intra-node
FE
BE
LB
LB
ECMP
FE
FE
BE
BE
BE
Small HTTP GET
Large Cat Pictures/Videos
Performance
Demo
Q&A
Start hacking on BPF for containers:
https://github.com/cilium/cilium
Slack: Twitter
cilium.slack.com @tgraf__
Thank You
● L3 forwarding (IPv6 & IPv4)
● Host connectivity
● Encapsulation
(VXLAN/Geneve/GRE)
● ICMPv6 & ICMP generation
● NDisc & ARP responder
● Access Control
● Port mapping
● Connection tracking
● L3/L4 Load balancer w/ DSR
● Statistics
● Events (perf ring buffer)
● Debugging framework
● NAT46
Building Blocks

Cilium - BPF & XDP for containers

  • 1.
    Cilium Networking & Securityfor Containers with BPF & XDP Docker Distributed Systems Summit Thomas Graf
  • 2.
    The Network becomesthe Application bus We have to deal with networks that ... ○ contain millions of endpoints ○ are noisy (nMpps) ○ are insecure with multiple tenants ○ operate unreliably ○ are constantly evolving WRT protocols
  • 3.
  • 4.
  • 5.
    BPF Code Generationat Container Startup ● Generate networking code at container startup ○ Tailored to each individual container ○ Leads to minimal code required ⇒ faster ⇒ smaller attack surface (unikernel like) ● Majority of configuration (IP, MAC, ports, ... ) becomes constant, the compiler can optimize heavily ● Regeneration at runtime without breaking connections
  • 6.
    Make all tasksglobally addressable on the Internet ● Global IPv6 addresses ○ No NAT! ○ Native IPv4/NAT46 + NAT for compat ● Host scope address allocator ○ Lockless allocation ● Task mobility ○ ILA
  • 7.
    Scaling Policy Specification ●How to specify policy for millions of endpoints? ● Decouple policy specification from addressing ○ IP+port ACLs are unsuitable for containers ○ Policy specification based on container labels Frontend BackendLB FE BE LB LB FE FE BE LB
  • 8.
    Scaling Policy Specification ●How to specify policy for millions of endpoints? ● Decouple policy specification from addressing ○ IP+port ACLs are unsuitable for containers ○ Policy specification based on container labels Frontend BackendLB FE BE LB LB FE FE BE LB Prod Frontend BackendLB FE BELB QA Prod QA Prodrequires requires QA QA
  • 9.
    Scaling Policy Enforcement ●Distributed fixed cost policy enforcement ○ Per-CPU BPF-map hashtable FE BE LB Prod QA Prod Prod FE BE LB QA QA 10 11 12 13 14 15 16 Cluster Wide Label ID Table: This ID is carried in the network packet and used to reconstruct the label context at the receiving host. Policy enforcement cost is reduced to a single hashtable lookup regardless of complexity.
  • 10.
    Extensibility & Safetyin the Kernel ● Decouple datapath functionality from kernel version ○ Support new protocols ○ Add arbitrary statistics ○ Safety guaranteed by Verifier ● All at runtime for already running containers
  • 11.
    Scaling the Deliveryof Cat Pictures ● Distributed L3/L4 LB w/ DSR ● Like IPVS but completely programmable ● LB for N-S, E-W & Intra-node FE BE LB LB ECMP FE FE BE BE BE Small HTTP GET Large Cat Pictures/Videos
  • 12.
  • 13.
  • 14.
    Q&A Start hacking onBPF for containers: https://github.com/cilium/cilium Slack: Twitter cilium.slack.com @tgraf__ Thank You
  • 15.
    ● L3 forwarding(IPv6 & IPv4) ● Host connectivity ● Encapsulation (VXLAN/Geneve/GRE) ● ICMPv6 & ICMP generation ● NDisc & ARP responder ● Access Control ● Port mapping ● Connection tracking ● L3/L4 Load balancer w/ DSR ● Statistics ● Events (perf ring buffer) ● Debugging framework ● NAT46 Building Blocks