SlideShare a Scribd company logo
Replacing iptables with eBPF in
Kubernetes with Cilium
Cilium, eBPF, Envoy, Istio, Hubble
Michal Rostecki
Software Engineer
mrostecki@suse.com
mrostecki@opensuse.org
Swaminathan Vasudevan
Software Engineer
svasudevan@suse.com
22
What’s wrong with iptables?
3
IPtables runs into a couple of significant problems:
● Iptables updates must be made by recreating and updating all rules in a
single transaction.
● Implements chains of rules as a linked list, so almost all operations are O(n).
● The standard practice of implementing access control lists (ACLs) as
implemented by iptables was to use sequential list of rules.
● It’s based on matching IPs and ports, not aware about L7 protocols.
● Every time you have a new IP or port to match, rules need to be added and
the chain changed.
● Has high consumption of resources on Kubernetes.
What’s wrong with legacy iptables?
4
Complexity of iptables
● Linked list.
● All rules in the chain have to be replaced as a whole.
Rule 1
Rule 2
Rule n
...
Search O(n)
Insert O(1)
Delete O(n)
5
Kubernetes uses iptables for...
● kube-proxy - the component which implements Services and load
balancing by DNAT iptables rules
● the most of CNI plugins are using iptables for Network Policies
6
What is BPF?
7
HW Bridge OVS .
Netdevice / Drivers
Traffic Shaping
Ethernet
IPv4 IPv6
Netfilter
TCP UDP Raw
Sockets
System Call Interface
Process Process Process
● The Linux kernel stack is split into multiple abstraction
layers.
● Strong userspace API compatibility in Linux for years.
● This shows how complex the linux kernel is and its years
of evolution.
● This cannot be replaced in a short term.
● Very hard to bypass the layers.
● Netfilter module has been supported by linux for more
than two decades and packet filtering has to applied to
packets that moves up and down the stack.
Linux Network Stack
8
HW Bridge OVS .
Netdevice / Drivers
Traffic Shaping
Ethernet
IPv4 IPv6
Netfilter
TCP UDP Raw
Sockets
System Call Interface
Process Process Process
BPF System calls
BPF Sockmap and
Sockops
BPF TC hooks
BPF XDP
BPF kernel hooks
BPF cGroups
9
Mpps
10
PREROUTING INPUT OUTPUTFORWARD POSTROUTING
FILTER
FILTER FILTER
NAT
NAT
Routing
Decision
NAT
Routing
Decision
Routing
Decision
Netdev
(Physical or
virtual Device)
Netdev
(Physical or
virtual Device)
Local
Processes
eBPF
Code
eBPF
Code
IPTables
netfilter
hooks
eBPF
TC
hooks
XDP
hooks
BPF replaces IPtables
11
NetFilter NetFilter
To Linux
Stack
From Linux
Stack
Netdev
(Physical or
virtual Device)
Netdev
(Physical or
virtual Device)
Ingress
Chain
Selector
INGRESS
CHAIN
FORWARD
CHAIN
[local dst]
[rem
ote
dst]
TC/XDP Ingress
hook
TC Egress hook
Egress Chain
Selector
OUTPUT
CHAIN
[local src]
[remote
src]
Update
session
Label Packet
Update
session
Label Packet
Store
session
Store
session
Store
session
Update
session
Label Packet
Connection Tracking
BPF based filtering architecture
12
….
Headers
parsing
IP.dst
lookup
IP1 bitv1
IP2 bitv2
IP3 bitv3
eBPF Program #1 eBPF Program #2 eBPF Program #3
IP.proto
lookup
* bitv1
udp bitv2
tcp bitv3
Bitwise
AND
bit-vectors
Search
first
Matching
rule
Update
counters
ACTION
(drop/
accept)
rule1 act1
rule2 act2
rule3 act3
rule1 cnt1
rule2 cnt2
eBPF
Program
eBPF Program #N
Packet in
Packet out
From eBPF hook
To eBPF hook
Tailcall
Tailcall
Tailcall
Tailcall
Packet header offsets
Bitvector with temporary result
per cpu _array shared across the entire program chain
per cpu _array shared across the entire program chain
Each eBPF program can exploit a
different matching algorithm (e.g.,
exact match, longest prefix match,
etc).
Each eBPF program is
injected only if there are
rules operating on that
field.
LBVS is implemented
with a chain of eBPF
programs, connected
through tail calls.
Header parsing is done
once and results are kept
in a shared map for
performance reasons
BPF based tail calls
13
BPF goes into...
● Load balancers - katran
● perf
● systemd
● Suricata
● Open vSwitch - AF_XDP
● And many many others
14
BPF is used by...
1515
Cilium
16
What is Cilium?
17
CNI Functionality
CNI is a CNCF ( Cloud Native Computing Foundation) project for Linux Containers
It consists of specification and libraries for writing plugins.
Only care about networking connectivity of containers
● ADD/DEL
General container runtime considerations for CNI:
The container runtime must
● create a new network namespace for the container before invoking any plugins
● determine the network for the container and add the container to the each network by calling the corresponding plugins for each network
● not invoke parallel operations for the same container.
● order ADD and DEL operations for a container, such that ADD is always eventually followed by a corresponding DEL.
● not call ADD twice ( without a corresponding DEL ) for the same ( network name, container id, name of the interface inside the container).
When CNI ADD call is invoked it tries to add the network to the container with respective veth pairs and assigning IP address from the respective IPAM
Plugin or using the Host Scope.
When CNI DEL call is invoked it tries to remove the container network, release the IP Address to the IPAM Manager and cleans up the veth pairs.
18
Kubernetes API Server
Kubelet
CRI-Containerd
CNI-Plugin (Cilium)
Cilium Agent
eth0
BPF Maps
Container2
Container1
Linux Kernel
Network
Stack 000 c1 FE 0A
001 54 45 31
002 A1 B1 C1
004 32 66 AA
cni-add()..
Kubectl
K8s Pod
Userspace
Kernel
bpf_syscall()
BPF
Hook
Cilium CNI Plugin control Flow
19
Cilium Components with BPF hook points and BPF maps shown in
Linux Stack Orchestrator
20
container A container B container C
eth0 eth0 eth0
lxc0 lxc0 lxc1
eth0 eth0
21
Networking modes
Use case:
Cilium handling routing between nodes
Encapsulation
Use case:
Using cloud provider routers, using BGP
routing daemon
Direct routing
Node A
Node B
Node C
VXLAN
VXLAN
VXLAN
Node A
Node B Node C
Cloud or BGP
routing
22
23
24
L3 filtering – label based, ingress
Pod
Labels: role=frontend
IP: 10.0.0.1
Pod
Labels: role=frontend
IP: 10.0.0.2
Pod
IP: 10.0.0.5
Pod
Labels: role=backend
IP: 10.0.0.3
Pod
Labels: role=backend
IP: 10.0.0.4
allow
deny
25
L3 filtering – label based, ingress
apiVersion: "cilium.io/v2"
kind: CiliumNetworkPolicy
description: "Allow frontends to access backends"
metadata:
name: "frontend-backend"
spec:
endpointSelector:
matchLabels:
role: backend
ingress:
- fromEndpoints:
- matchLabels:
class: frontend
26
L3 filtering – CIDR based, egress
IP: 10.0.1.1
Subnet: 10.0.1.0/24
IP: 10.0.2.1
Subnet: 10.0.2.0/24
allow
deny
Cluster A
Pod
Labels: role=backend
IP: 10.0.0.1
Any IP not belonging
to 10.0.1.0/24
27
L3 filtering – CIDR based, egress
apiVersion: "cilium.io/v2"
kind: CiliumNetworkPolicy
description: "Allow backends to access 10.0.1.0/24"
metadata:
name: "frontend-backend"
spec:
endpointSelector:
matchLabels:
role: backend
egress:
- toCIDR:
- IP: “10.0.1.0/24”
28
L4 filtering
Pod
Labels: role=backend
IP: 10.0.0.1
allow
deny
TCP/80
Any other port
29
L4 filtering
apiVersion: "cilium.io/v2"
kind: CiliumNetworkPolicy
description: "Allow to access backends only on TCP/80"
metadata:
name: "frontend-backend"
spec:
endpointSelector:
matchLabels:
role: backend
ingress:
- toPorts:
- ports:
- port: “80”
protocol: “TCP”
30
L7 filtering – API Aware Security
Pod
Labels: role=api
IP: 10.0.0.1
GET /articles/{id}
GET /private
Pod
IP: 10.0.0.5
31
L7 filtering – API Aware Security
apiVersion: "cilium.io/v2"
kind: CiliumNetworkPolicy
description: "L7 policy to restict access to specific HTTP endpoints"
metadata:
name: "frontend-backend"
endpointSelector:
matchLabels:
role: backend
ingress:
- toPorts:
- ports:
- port: “80”
protocol: “TCP”
rules:
http:
- method: "GET"
path: "/article/$"
32
Standalone proxy, L7 filtering
Node A
Pod A
+ BPF
Envoy
Generating BPF programs for
L7 filtering through libcilium.so
Node B
Pod B
+ BPF
Envoy
Generating BPF programs for
L7 filtering through libcilium.so
Generating BPF programs
for L3/L4 filtering
Generating BPF programs
for L3/L4 filtering
VXLAN
33
Features
34
Cluster Mesh
Cluster A Cluster B
Node A
Pod A
+ BPF
Node B
+ BPF
Container
eth0
Pod B
Container
eth0
Pod C
Container
eth0
External etcd
Node A
Pod A
+ BPF
Container
eth0
35
Socket Socket Socket Socket
Service Service
Socket
TCP/IP
Ethernet
eth0
Socket
TCP/IP
Ethernet
eth0
Network
TCP/IP
Ethernet
IPtables
TCP/IP
Ethernet
IPtables
Loopback
IPtables IPtables
TCP/IP TCP/IP
Ethernet Ethernet
Loopback
36
Cilium CNI Cilium CNI
Socket Socket Socket Socket
Service Service
Socket
TCP/IP
Ethernet
eth0
Socket
TCP/IP
Ethernet
eth0
Network
37
Service A Service B Service C
38
Service A Service B
39
Service A Service B
External
Github
Service
External
Cloud
Network
40
Kubernetes Services
● Hash table.
BPF, Cilium
● Linked list.
● All rules in the chain have to be
replaced as a whole.
Iptables, kube-proxy
Key
Key
Key
Value
Value
Value
Rule 1
Rule 2
Rule n
...
Search O(1)
Insert O(1)
Delete O(1)
Search O(n)
Insert O(1)
Delete O(n)
41
usec
number of services in cluster
42
CNI chaining
Policy enforcement, load balancing,
multi-cluster connectivity
IP allocation, configuring network
interface, encapsulation/routing
inside the cluster
43
Native support for AWS ENI
44
●
●
●
●
●
●
●
45
●
○
●
○
●
○
●
○
46
47
48
4949
To sum it up
50
Why Cilium is awesome?
● It makes disadvantages of iptables disappear. And always gets the best
from the Linux kernel.
● Cluster Mesh / multi-cluster.
● Makes Istio faster.
● Offers L7 API Aware filtering as a Kubernetes resource.
● Integrates with the other popular CNI plugins – Calico, Flannel, Weave,
Lyft, AWS CNI.
Replacing iptables with eBPF in Kubernetes with Cilium

More Related Content

What's hot

eBPF maps 101
eBPF maps 101eBPF maps 101
eBPF maps 101
SUSE Labs Taipei
 
Cloud Native Networking & Security with Cilium & eBPF
Cloud Native Networking & Security with Cilium & eBPFCloud Native Networking & Security with Cilium & eBPF
Cloud Native Networking & Security with Cilium & eBPF
Raphaël PINSON
 
Accelerating Envoy and Istio with Cilium and the Linux Kernel
Accelerating Envoy and Istio with Cilium and the Linux KernelAccelerating Envoy and Istio with Cilium and the Linux Kernel
Accelerating Envoy and Istio with Cilium and the Linux Kernel
Thomas Graf
 
Introduction to eBPF and XDP
Introduction to eBPF and XDPIntroduction to eBPF and XDP
Introduction to eBPF and XDP
lcplcp1
 
eBPF Workshop
eBPF WorkshopeBPF Workshop
eBPF Workshop
Michael Kehoe
 
Cilium - BPF & XDP for containers
 Cilium - BPF & XDP for containers Cilium - BPF & XDP for containers
Cilium - BPF & XDP for containers
Docker, Inc.
 
Linux Native, HTTP Aware Network Security
Linux Native, HTTP Aware Network SecurityLinux Native, HTTP Aware Network Security
Linux Native, HTTP Aware Network Security
Thomas Graf
 
Understanding eBPF in a Hurry!
Understanding eBPF in a Hurry!Understanding eBPF in a Hurry!
Understanding eBPF in a Hurry!
Ray Jenkins
 
Cilium - Container Networking with BPF & XDP
Cilium - Container Networking with BPF & XDPCilium - Container Networking with BPF & XDP
Cilium - Container Networking with BPF & XDP
Thomas Graf
 
eBPF Trace from Kernel to Userspace
eBPF Trace from Kernel to UserspaceeBPF Trace from Kernel to Userspace
eBPF Trace from Kernel to Userspace
SUSE Labs Taipei
 
Cilium - Bringing the BPF Revolution to Kubernetes Networking and Security
Cilium - Bringing the BPF Revolution to Kubernetes Networking and SecurityCilium - Bringing the BPF Revolution to Kubernetes Networking and Security
Cilium - Bringing the BPF Revolution to Kubernetes Networking and Security
Thomas Graf
 
BPF Internals (eBPF)
BPF Internals (eBPF)BPF Internals (eBPF)
BPF Internals (eBPF)
Brendan Gregg
 
Kubernetes Networking with Cilium - Deep Dive
Kubernetes Networking with Cilium - Deep DiveKubernetes Networking with Cilium - Deep Dive
Kubernetes Networking with Cilium - Deep Dive
Michal Rostecki
 
Cilium - Network security for microservices
Cilium - Network security for microservicesCilium - Network security for microservices
Cilium - Network security for microservices
Thomas Graf
 
DevConf 2014 Kernel Networking Walkthrough
DevConf 2014   Kernel Networking WalkthroughDevConf 2014   Kernel Networking Walkthrough
DevConf 2014 Kernel Networking Walkthrough
Thomas Graf
 
Introduction to eBPF
Introduction to eBPFIntroduction to eBPF
Introduction to eBPF
RogerColl2
 
Tracing MariaDB server with bpftrace - MariaDB Server Fest 2021
Tracing MariaDB server with bpftrace - MariaDB Server Fest 2021Tracing MariaDB server with bpftrace - MariaDB Server Fest 2021
Tracing MariaDB server with bpftrace - MariaDB Server Fest 2021
Valeriy Kravchuk
 
Linux Network Stack
Linux Network StackLinux Network Stack
Linux Network Stack
Adrien Mahieux
 
eBPF/XDP
eBPF/XDP eBPF/XDP
eBPF/XDP
Netronome
 
Linux Networking Explained
Linux Networking ExplainedLinux Networking Explained
Linux Networking Explained
Thomas Graf
 

What's hot (20)

eBPF maps 101
eBPF maps 101eBPF maps 101
eBPF maps 101
 
Cloud Native Networking & Security with Cilium & eBPF
Cloud Native Networking & Security with Cilium & eBPFCloud Native Networking & Security with Cilium & eBPF
Cloud Native Networking & Security with Cilium & eBPF
 
Accelerating Envoy and Istio with Cilium and the Linux Kernel
Accelerating Envoy and Istio with Cilium and the Linux KernelAccelerating Envoy and Istio with Cilium and the Linux Kernel
Accelerating Envoy and Istio with Cilium and the Linux Kernel
 
Introduction to eBPF and XDP
Introduction to eBPF and XDPIntroduction to eBPF and XDP
Introduction to eBPF and XDP
 
eBPF Workshop
eBPF WorkshopeBPF Workshop
eBPF Workshop
 
Cilium - BPF & XDP for containers
 Cilium - BPF & XDP for containers Cilium - BPF & XDP for containers
Cilium - BPF & XDP for containers
 
Linux Native, HTTP Aware Network Security
Linux Native, HTTP Aware Network SecurityLinux Native, HTTP Aware Network Security
Linux Native, HTTP Aware Network Security
 
Understanding eBPF in a Hurry!
Understanding eBPF in a Hurry!Understanding eBPF in a Hurry!
Understanding eBPF in a Hurry!
 
Cilium - Container Networking with BPF & XDP
Cilium - Container Networking with BPF & XDPCilium - Container Networking with BPF & XDP
Cilium - Container Networking with BPF & XDP
 
eBPF Trace from Kernel to Userspace
eBPF Trace from Kernel to UserspaceeBPF Trace from Kernel to Userspace
eBPF Trace from Kernel to Userspace
 
Cilium - Bringing the BPF Revolution to Kubernetes Networking and Security
Cilium - Bringing the BPF Revolution to Kubernetes Networking and SecurityCilium - Bringing the BPF Revolution to Kubernetes Networking and Security
Cilium - Bringing the BPF Revolution to Kubernetes Networking and Security
 
BPF Internals (eBPF)
BPF Internals (eBPF)BPF Internals (eBPF)
BPF Internals (eBPF)
 
Kubernetes Networking with Cilium - Deep Dive
Kubernetes Networking with Cilium - Deep DiveKubernetes Networking with Cilium - Deep Dive
Kubernetes Networking with Cilium - Deep Dive
 
Cilium - Network security for microservices
Cilium - Network security for microservicesCilium - Network security for microservices
Cilium - Network security for microservices
 
DevConf 2014 Kernel Networking Walkthrough
DevConf 2014   Kernel Networking WalkthroughDevConf 2014   Kernel Networking Walkthrough
DevConf 2014 Kernel Networking Walkthrough
 
Introduction to eBPF
Introduction to eBPFIntroduction to eBPF
Introduction to eBPF
 
Tracing MariaDB server with bpftrace - MariaDB Server Fest 2021
Tracing MariaDB server with bpftrace - MariaDB Server Fest 2021Tracing MariaDB server with bpftrace - MariaDB Server Fest 2021
Tracing MariaDB server with bpftrace - MariaDB Server Fest 2021
 
Linux Network Stack
Linux Network StackLinux Network Stack
Linux Network Stack
 
eBPF/XDP
eBPF/XDP eBPF/XDP
eBPF/XDP
 
Linux Networking Explained
Linux Networking ExplainedLinux Networking Explained
Linux Networking Explained
 

Similar to Replacing iptables with eBPF in Kubernetes with Cilium

Harmonia open iris_basic_v0.1
Harmonia open iris_basic_v0.1Harmonia open iris_basic_v0.1
Harmonia open iris_basic_v0.1
Yongyoon Shin
 
ebpf and IO Visor: The What, how, and what next!
ebpf and IO Visor: The What, how, and what next!ebpf and IO Visor: The What, how, and what next!
ebpf and IO Visor: The What, how, and what next!
Affan Syed
 
Unifying Network Filtering Rules for the Linux Kernel with eBPF
Unifying Network Filtering Rules for the Linux Kernel with eBPFUnifying Network Filtering Rules for the Linux Kernel with eBPF
Unifying Network Filtering Rules for the Linux Kernel with eBPF
Netronome
 
Kernel Recipes 2014 - NDIV: a low overhead network traffic diverter
Kernel Recipes 2014 - NDIV: a low overhead network traffic diverterKernel Recipes 2014 - NDIV: a low overhead network traffic diverter
Kernel Recipes 2014 - NDIV: a low overhead network traffic diverter
Anne Nicolas
 
Packet Walk(s) In Kubernetes
Packet Walk(s) In KubernetesPacket Walk(s) In Kubernetes
Packet Walk(s) In Kubernetes
Don Jayakody
 
Scaling the Container Dataplane
Scaling the Container Dataplane Scaling the Container Dataplane
Scaling the Container Dataplane
Michelle Holley
 
Introduction to TCP/IP
Introduction to TCP/IPIntroduction to TCP/IP
Introduction to TCP/IP
Frank Fang Kuo Yu
 
Scaling Kubernetes to Support 50000 Services.pptx
Scaling Kubernetes to Support 50000 Services.pptxScaling Kubernetes to Support 50000 Services.pptx
Scaling Kubernetes to Support 50000 Services.pptx
thaond2
 
IPTABLES Introduction
IPTABLES IntroductionIPTABLES Introduction
IPTABLES Introduction
HungWei Chiu
 
Netlink-Optimization.pptx
Netlink-Optimization.pptxNetlink-Optimization.pptx
Netlink-Optimization.pptx
KalimuthuVelappan
 
Protecting host with calico
Protecting host with calicoProtecting host with calico
Protecting host with calico
Anirban Sen Chowdhary
 
OSDC 2017 - Casey Callendrello -The evolution of the Container Network Interface
OSDC 2017 - Casey Callendrello -The evolution of the Container Network InterfaceOSDC 2017 - Casey Callendrello -The evolution of the Container Network Interface
OSDC 2017 - Casey Callendrello -The evolution of the Container Network Interface
NETWAYS
 
OSDC 2017 | The evolution of the Container Network Interface by Casey Callend...
OSDC 2017 | The evolution of the Container Network Interface by Casey Callend...OSDC 2017 | The evolution of the Container Network Interface by Casey Callend...
OSDC 2017 | The evolution of the Container Network Interface by Casey Callend...
NETWAYS
 
SRE NL MeetUp - eBPF.pdf
SRE NL MeetUp - eBPF.pdfSRE NL MeetUp - eBPF.pdf
SRE NL MeetUp - eBPF.pdf
SiteReliabilityEngin
 
iptables 101- bottom-up
iptables 101- bottom-upiptables 101- bottom-up
iptables 101- bottom-up
HungWei Chiu
 
Cilium - Fast IPv6 Container Networking with BPF and XDP
Cilium - Fast IPv6 Container Networking with BPF and XDPCilium - Fast IPv6 Container Networking with BPF and XDP
Cilium - Fast IPv6 Container Networking with BPF and XDP
Thomas Graf
 
Linux rumpkernel - ABC2018 (AsiaBSDCon 2018)
Linux rumpkernel - ABC2018 (AsiaBSDCon 2018)Linux rumpkernel - ABC2018 (AsiaBSDCon 2018)
Linux rumpkernel - ABC2018 (AsiaBSDCon 2018)
Hajime Tazaki
 
Container & kubernetes
Container & kubernetesContainer & kubernetes
Container & kubernetes
Ted Jung
 
Kubernetes networking
Kubernetes networkingKubernetes networking
Kubernetes networking
Sim Janghoon
 
Linux router
Linux routerLinux router

Similar to Replacing iptables with eBPF in Kubernetes with Cilium (20)

Harmonia open iris_basic_v0.1
Harmonia open iris_basic_v0.1Harmonia open iris_basic_v0.1
Harmonia open iris_basic_v0.1
 
ebpf and IO Visor: The What, how, and what next!
ebpf and IO Visor: The What, how, and what next!ebpf and IO Visor: The What, how, and what next!
ebpf and IO Visor: The What, how, and what next!
 
Unifying Network Filtering Rules for the Linux Kernel with eBPF
Unifying Network Filtering Rules for the Linux Kernel with eBPFUnifying Network Filtering Rules for the Linux Kernel with eBPF
Unifying Network Filtering Rules for the Linux Kernel with eBPF
 
Kernel Recipes 2014 - NDIV: a low overhead network traffic diverter
Kernel Recipes 2014 - NDIV: a low overhead network traffic diverterKernel Recipes 2014 - NDIV: a low overhead network traffic diverter
Kernel Recipes 2014 - NDIV: a low overhead network traffic diverter
 
Packet Walk(s) In Kubernetes
Packet Walk(s) In KubernetesPacket Walk(s) In Kubernetes
Packet Walk(s) In Kubernetes
 
Scaling the Container Dataplane
Scaling the Container Dataplane Scaling the Container Dataplane
Scaling the Container Dataplane
 
Introduction to TCP/IP
Introduction to TCP/IPIntroduction to TCP/IP
Introduction to TCP/IP
 
Scaling Kubernetes to Support 50000 Services.pptx
Scaling Kubernetes to Support 50000 Services.pptxScaling Kubernetes to Support 50000 Services.pptx
Scaling Kubernetes to Support 50000 Services.pptx
 
IPTABLES Introduction
IPTABLES IntroductionIPTABLES Introduction
IPTABLES Introduction
 
Netlink-Optimization.pptx
Netlink-Optimization.pptxNetlink-Optimization.pptx
Netlink-Optimization.pptx
 
Protecting host with calico
Protecting host with calicoProtecting host with calico
Protecting host with calico
 
OSDC 2017 - Casey Callendrello -The evolution of the Container Network Interface
OSDC 2017 - Casey Callendrello -The evolution of the Container Network InterfaceOSDC 2017 - Casey Callendrello -The evolution of the Container Network Interface
OSDC 2017 - Casey Callendrello -The evolution of the Container Network Interface
 
OSDC 2017 | The evolution of the Container Network Interface by Casey Callend...
OSDC 2017 | The evolution of the Container Network Interface by Casey Callend...OSDC 2017 | The evolution of the Container Network Interface by Casey Callend...
OSDC 2017 | The evolution of the Container Network Interface by Casey Callend...
 
SRE NL MeetUp - eBPF.pdf
SRE NL MeetUp - eBPF.pdfSRE NL MeetUp - eBPF.pdf
SRE NL MeetUp - eBPF.pdf
 
iptables 101- bottom-up
iptables 101- bottom-upiptables 101- bottom-up
iptables 101- bottom-up
 
Cilium - Fast IPv6 Container Networking with BPF and XDP
Cilium - Fast IPv6 Container Networking with BPF and XDPCilium - Fast IPv6 Container Networking with BPF and XDP
Cilium - Fast IPv6 Container Networking with BPF and XDP
 
Linux rumpkernel - ABC2018 (AsiaBSDCon 2018)
Linux rumpkernel - ABC2018 (AsiaBSDCon 2018)Linux rumpkernel - ABC2018 (AsiaBSDCon 2018)
Linux rumpkernel - ABC2018 (AsiaBSDCon 2018)
 
Container & kubernetes
Container & kubernetesContainer & kubernetes
Container & kubernetes
 
Kubernetes networking
Kubernetes networkingKubernetes networking
Kubernetes networking
 
Linux router
Linux routerLinux router
Linux router
 

Recently uploaded

GOING AOT WITH GRAALVM FOR SPRING BOOT (SPRING IO)
GOING AOT WITH GRAALVM FOR  SPRING BOOT (SPRING IO)GOING AOT WITH GRAALVM FOR  SPRING BOOT (SPRING IO)
GOING AOT WITH GRAALVM FOR SPRING BOOT (SPRING IO)
Alina Yurenko
 
Enhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdfEnhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdf
Globus
 
Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024
Paco van Beckhoven
 
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Mind IT Systems
 
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Łukasz Chruściel
 
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Globus
 
Graspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code AnalysisGraspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code Analysis
Aftab Hussain
 
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing SuiteAI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
Google
 
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
Łukasz Chruściel
 
Vitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdfVitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke
 
Atelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissancesAtelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissances
Neo4j
 
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, BetterWebinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
XfilesPro
 
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Crescat
 
openEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain SecurityopenEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain Security
Shane Coughlan
 
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptxTop Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
rickgrimesss22
 
First Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User EndpointsFirst Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User Endpoints
Globus
 
Text-Summarization-of-Breaking-News-Using-Fine-tuning-BART-Model.pptx
Text-Summarization-of-Breaking-News-Using-Fine-tuning-BART-Model.pptxText-Summarization-of-Breaking-News-Using-Fine-tuning-BART-Model.pptx
Text-Summarization-of-Breaking-News-Using-Fine-tuning-BART-Model.pptx
ShamsuddeenMuhammadA
 
BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024
Ortus Solutions, Corp
 
Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024
Globus
 
Enterprise Resource Planning System in Telangana
Enterprise Resource Planning System in TelanganaEnterprise Resource Planning System in Telangana
Enterprise Resource Planning System in Telangana
NYGGS Automation Suite
 

Recently uploaded (20)

GOING AOT WITH GRAALVM FOR SPRING BOOT (SPRING IO)
GOING AOT WITH GRAALVM FOR  SPRING BOOT (SPRING IO)GOING AOT WITH GRAALVM FOR  SPRING BOOT (SPRING IO)
GOING AOT WITH GRAALVM FOR SPRING BOOT (SPRING IO)
 
Enhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdfEnhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdf
 
Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024
 
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
 
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
 
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
 
Graspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code AnalysisGraspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code Analysis
 
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing SuiteAI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
 
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
 
Vitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdfVitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdf
 
Atelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissancesAtelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissances
 
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, BetterWebinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
 
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
 
openEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain SecurityopenEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain Security
 
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptxTop Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
 
First Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User EndpointsFirst Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User Endpoints
 
Text-Summarization-of-Breaking-News-Using-Fine-tuning-BART-Model.pptx
Text-Summarization-of-Breaking-News-Using-Fine-tuning-BART-Model.pptxText-Summarization-of-Breaking-News-Using-Fine-tuning-BART-Model.pptx
Text-Summarization-of-Breaking-News-Using-Fine-tuning-BART-Model.pptx
 
BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024
 
Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024
 
Enterprise Resource Planning System in Telangana
Enterprise Resource Planning System in TelanganaEnterprise Resource Planning System in Telangana
Enterprise Resource Planning System in Telangana
 

Replacing iptables with eBPF in Kubernetes with Cilium

  • 1. Replacing iptables with eBPF in Kubernetes with Cilium Cilium, eBPF, Envoy, Istio, Hubble Michal Rostecki Software Engineer mrostecki@suse.com mrostecki@opensuse.org Swaminathan Vasudevan Software Engineer svasudevan@suse.com
  • 3. 3 IPtables runs into a couple of significant problems: ● Iptables updates must be made by recreating and updating all rules in a single transaction. ● Implements chains of rules as a linked list, so almost all operations are O(n). ● The standard practice of implementing access control lists (ACLs) as implemented by iptables was to use sequential list of rules. ● It’s based on matching IPs and ports, not aware about L7 protocols. ● Every time you have a new IP or port to match, rules need to be added and the chain changed. ● Has high consumption of resources on Kubernetes. What’s wrong with legacy iptables?
  • 4. 4 Complexity of iptables ● Linked list. ● All rules in the chain have to be replaced as a whole. Rule 1 Rule 2 Rule n ... Search O(n) Insert O(1) Delete O(n)
  • 5. 5 Kubernetes uses iptables for... ● kube-proxy - the component which implements Services and load balancing by DNAT iptables rules ● the most of CNI plugins are using iptables for Network Policies
  • 7. 7 HW Bridge OVS . Netdevice / Drivers Traffic Shaping Ethernet IPv4 IPv6 Netfilter TCP UDP Raw Sockets System Call Interface Process Process Process ● The Linux kernel stack is split into multiple abstraction layers. ● Strong userspace API compatibility in Linux for years. ● This shows how complex the linux kernel is and its years of evolution. ● This cannot be replaced in a short term. ● Very hard to bypass the layers. ● Netfilter module has been supported by linux for more than two decades and packet filtering has to applied to packets that moves up and down the stack. Linux Network Stack
  • 8. 8 HW Bridge OVS . Netdevice / Drivers Traffic Shaping Ethernet IPv4 IPv6 Netfilter TCP UDP Raw Sockets System Call Interface Process Process Process BPF System calls BPF Sockmap and Sockops BPF TC hooks BPF XDP BPF kernel hooks BPF cGroups
  • 10. 10 PREROUTING INPUT OUTPUTFORWARD POSTROUTING FILTER FILTER FILTER NAT NAT Routing Decision NAT Routing Decision Routing Decision Netdev (Physical or virtual Device) Netdev (Physical or virtual Device) Local Processes eBPF Code eBPF Code IPTables netfilter hooks eBPF TC hooks XDP hooks BPF replaces IPtables
  • 11. 11 NetFilter NetFilter To Linux Stack From Linux Stack Netdev (Physical or virtual Device) Netdev (Physical or virtual Device) Ingress Chain Selector INGRESS CHAIN FORWARD CHAIN [local dst] [rem ote dst] TC/XDP Ingress hook TC Egress hook Egress Chain Selector OUTPUT CHAIN [local src] [remote src] Update session Label Packet Update session Label Packet Store session Store session Store session Update session Label Packet Connection Tracking BPF based filtering architecture
  • 12. 12 …. Headers parsing IP.dst lookup IP1 bitv1 IP2 bitv2 IP3 bitv3 eBPF Program #1 eBPF Program #2 eBPF Program #3 IP.proto lookup * bitv1 udp bitv2 tcp bitv3 Bitwise AND bit-vectors Search first Matching rule Update counters ACTION (drop/ accept) rule1 act1 rule2 act2 rule3 act3 rule1 cnt1 rule2 cnt2 eBPF Program eBPF Program #N Packet in Packet out From eBPF hook To eBPF hook Tailcall Tailcall Tailcall Tailcall Packet header offsets Bitvector with temporary result per cpu _array shared across the entire program chain per cpu _array shared across the entire program chain Each eBPF program can exploit a different matching algorithm (e.g., exact match, longest prefix match, etc). Each eBPF program is injected only if there are rules operating on that field. LBVS is implemented with a chain of eBPF programs, connected through tail calls. Header parsing is done once and results are kept in a shared map for performance reasons BPF based tail calls
  • 13. 13 BPF goes into... ● Load balancers - katran ● perf ● systemd ● Suricata ● Open vSwitch - AF_XDP ● And many many others
  • 14. 14 BPF is used by...
  • 17. 17 CNI Functionality CNI is a CNCF ( Cloud Native Computing Foundation) project for Linux Containers It consists of specification and libraries for writing plugins. Only care about networking connectivity of containers ● ADD/DEL General container runtime considerations for CNI: The container runtime must ● create a new network namespace for the container before invoking any plugins ● determine the network for the container and add the container to the each network by calling the corresponding plugins for each network ● not invoke parallel operations for the same container. ● order ADD and DEL operations for a container, such that ADD is always eventually followed by a corresponding DEL. ● not call ADD twice ( without a corresponding DEL ) for the same ( network name, container id, name of the interface inside the container). When CNI ADD call is invoked it tries to add the network to the container with respective veth pairs and assigning IP address from the respective IPAM Plugin or using the Host Scope. When CNI DEL call is invoked it tries to remove the container network, release the IP Address to the IPAM Manager and cleans up the veth pairs.
  • 18. 18 Kubernetes API Server Kubelet CRI-Containerd CNI-Plugin (Cilium) Cilium Agent eth0 BPF Maps Container2 Container1 Linux Kernel Network Stack 000 c1 FE 0A 001 54 45 31 002 A1 B1 C1 004 32 66 AA cni-add().. Kubectl K8s Pod Userspace Kernel bpf_syscall() BPF Hook Cilium CNI Plugin control Flow
  • 19. 19 Cilium Components with BPF hook points and BPF maps shown in Linux Stack Orchestrator
  • 20. 20 container A container B container C eth0 eth0 eth0 lxc0 lxc0 lxc1 eth0 eth0
  • 21. 21 Networking modes Use case: Cilium handling routing between nodes Encapsulation Use case: Using cloud provider routers, using BGP routing daemon Direct routing Node A Node B Node C VXLAN VXLAN VXLAN Node A Node B Node C Cloud or BGP routing
  • 22. 22
  • 23. 23
  • 24. 24 L3 filtering – label based, ingress Pod Labels: role=frontend IP: 10.0.0.1 Pod Labels: role=frontend IP: 10.0.0.2 Pod IP: 10.0.0.5 Pod Labels: role=backend IP: 10.0.0.3 Pod Labels: role=backend IP: 10.0.0.4 allow deny
  • 25. 25 L3 filtering – label based, ingress apiVersion: "cilium.io/v2" kind: CiliumNetworkPolicy description: "Allow frontends to access backends" metadata: name: "frontend-backend" spec: endpointSelector: matchLabels: role: backend ingress: - fromEndpoints: - matchLabels: class: frontend
  • 26. 26 L3 filtering – CIDR based, egress IP: 10.0.1.1 Subnet: 10.0.1.0/24 IP: 10.0.2.1 Subnet: 10.0.2.0/24 allow deny Cluster A Pod Labels: role=backend IP: 10.0.0.1 Any IP not belonging to 10.0.1.0/24
  • 27. 27 L3 filtering – CIDR based, egress apiVersion: "cilium.io/v2" kind: CiliumNetworkPolicy description: "Allow backends to access 10.0.1.0/24" metadata: name: "frontend-backend" spec: endpointSelector: matchLabels: role: backend egress: - toCIDR: - IP: “10.0.1.0/24”
  • 28. 28 L4 filtering Pod Labels: role=backend IP: 10.0.0.1 allow deny TCP/80 Any other port
  • 29. 29 L4 filtering apiVersion: "cilium.io/v2" kind: CiliumNetworkPolicy description: "Allow to access backends only on TCP/80" metadata: name: "frontend-backend" spec: endpointSelector: matchLabels: role: backend ingress: - toPorts: - ports: - port: “80” protocol: “TCP”
  • 30. 30 L7 filtering – API Aware Security Pod Labels: role=api IP: 10.0.0.1 GET /articles/{id} GET /private Pod IP: 10.0.0.5
  • 31. 31 L7 filtering – API Aware Security apiVersion: "cilium.io/v2" kind: CiliumNetworkPolicy description: "L7 policy to restict access to specific HTTP endpoints" metadata: name: "frontend-backend" endpointSelector: matchLabels: role: backend ingress: - toPorts: - ports: - port: “80” protocol: “TCP” rules: http: - method: "GET" path: "/article/$"
  • 32. 32 Standalone proxy, L7 filtering Node A Pod A + BPF Envoy Generating BPF programs for L7 filtering through libcilium.so Node B Pod B + BPF Envoy Generating BPF programs for L7 filtering through libcilium.so Generating BPF programs for L3/L4 filtering Generating BPF programs for L3/L4 filtering VXLAN
  • 34. 34 Cluster Mesh Cluster A Cluster B Node A Pod A + BPF Node B + BPF Container eth0 Pod B Container eth0 Pod C Container eth0 External etcd Node A Pod A + BPF Container eth0
  • 35. 35 Socket Socket Socket Socket Service Service Socket TCP/IP Ethernet eth0 Socket TCP/IP Ethernet eth0 Network TCP/IP Ethernet IPtables TCP/IP Ethernet IPtables Loopback IPtables IPtables TCP/IP TCP/IP Ethernet Ethernet Loopback
  • 36. 36 Cilium CNI Cilium CNI Socket Socket Socket Socket Service Service Socket TCP/IP Ethernet eth0 Socket TCP/IP Ethernet eth0 Network
  • 37. 37 Service A Service B Service C
  • 39. 39 Service A Service B External Github Service External Cloud Network
  • 40. 40 Kubernetes Services ● Hash table. BPF, Cilium ● Linked list. ● All rules in the chain have to be replaced as a whole. Iptables, kube-proxy Key Key Key Value Value Value Rule 1 Rule 2 Rule n ... Search O(1) Insert O(1) Delete O(1) Search O(n) Insert O(1) Delete O(n)
  • 42. 42 CNI chaining Policy enforcement, load balancing, multi-cluster connectivity IP allocation, configuring network interface, encapsulation/routing inside the cluster
  • 46. 46
  • 47. 47
  • 48. 48
  • 50. 50 Why Cilium is awesome? ● It makes disadvantages of iptables disappear. And always gets the best from the Linux kernel. ● Cluster Mesh / multi-cluster. ● Makes Istio faster. ● Offers L7 API Aware filtering as a Kubernetes resource. ● Integrates with the other popular CNI plugins – Calico, Flannel, Weave, Lyft, AWS CNI.