The Linux kernel is undergoing the most fundamental architecture evolution in history and is becoming a microkernel. Why is the Linux kernel evolving into a microkernel? The potentially biggest fundamental change ever happening to the Linux kernel. This talk covers how companies like Facebook and Google use BPF to patch 0-day exploits, how BPF will change the way features are added to the kernel forever, and how BPF is introducing a new type of application deployment method for the Linux kernel.
4. Programmability Essentials
4
Untrusted code runs
in the browser of the
user.
→ Sandboxing
Allow evolution of
logic without requiring
to constantly ship new
browser versions.
→ Deploy anytime with
seamless upgrades
Programmability must
be provided with
minimal overhead.
→ Native Execution
(JIT compiler)
Safety
Continuous
Delivery
Performance
6. Cons:
● You likely need to ship a different
module for each kernel version
● Might crash your kernel
● Change kernel source code
● Expose configuration API
● Wait 5 years for your users
to upgrade
6
Kernel Development 101
● Write kernel module
● Every kernel release will break it
Cons:
Option 1
Native Support
Option 2
Kernel Module
7. How about we add
JavaScript-like capabilities
to the Linux Kernel?
7
11. eBPF Hooks
11
Process
Storage
Hardware
Sockets
TCP/IP
Network Device
read()
File Descriptor
VFS
Block Device
write()
Linux
Kernel
Network
Hardware
Process
SyscallSyscall
Where can you hook? kernel functions (kprobes), userspace functions (uprobes), system calls,
fentry/fexit, tracepoints, network devices (tc/xdp), network routes, TCP congestion algorithms,
sockets (data level)
recvmsg()sendmsg()
12. eBPF Maps
12
Controller
Sockets
Linux
Kernel
TCP/IP
Network Device
Process
Syscall Syscall
Admin
BPF
Map
Syscall
Map Types:
- Hash tables, Arrays
- LRU (Least Recently Used)
- Ring Buffer
- Stack Trace
- LPM (Longest Prefix match)
What are Maps used for?
● Program state
● Program configuration
● Share data between programs
● Share state, metrics, and
statistics with user space
recvmsg()sendmsg()
13. eBPF Helpers
13
Sockets
Linux
Kernel
TCP/IP
Network Device
Process
Syscall
What helpers exist?
● Random numbers
● Get current time
● Map access
● Get process/cgroup context
● Manipulate network packets and
forwarding
● Access socket data
● Perform tail call
● Access process stack
● Access syscall arguments
● ...
[...]
num = bpf_get_prandom_u32();
[...]
recvmsg()sendmsg()
14. eBPF Tail and Function Calls
14
Linux
Kernel
What are Tail Calls used for?
● Chain programs together
● Split programs into independent
logical components
● Make BPF programs composable
What are Functions Calls used for?
● Reuse functionality inside of a
program
● Reduce program size (avoid
inlining)
15. 15
Community
287 contributors:
(Jan 2016 to Jan 2020)
● 466 Daniel Borkmann (Cilium; maintainer)
● 290 Andrii Nakryiko (Facebook)
● 279 Alexei Starovoitov (Facebook; maintainer)
● 217 Jakub Kicinski (Facebook)
● 173 Yonghong Song (Facebook)
● 168 Martin KaFai Lau (Facebook)
● 159 Stanislav Fomichev (Google)
● 148 Quentin Monnet (Cilium)
● 148 John Fastabend (Cilium)
● 118 Jesper Dangaard Brouer (Red Hat)
● [...]
22. Development
Program Maps
Runtime
Go Development Toolchain
22
clang -target bpf
Sockets
Linux
Kernel
TCP/IP
recvmsg()sendmsg()
Process
Verifier
JIT Compiler
Syscall
BPF
Program
C source
BPF
Program
bytecode
BPF
Map
Syscall
Go Library
Go Library: https://github.com/cilium/ebpf
23. 23
Outlook: Future of
is turning the Linux
kernel into a microkernel.
● An increasing amount of new kernel
functionality is implemented with eBPF.
● 100% modular and composable.
● New additions can evolve at a rapid pace.
Much quicker than normal kernel
development.
Example: The linux kernel is not aware of
containers and microservices (it only knows
about namespaces). Cilium is making the
Linux kernel container and Kubernetes
aware.
could enable the Linux kernel
hotpatching we always dreamed about.
Problem:
● Linux kernel vulnerability requires to
patch kernel.
● Rebooting 20’000 servers takes a very
long time without risking extensive
downtime.
Function
Function
Function
Hotfix
Linux
Kernel
24. Thank You
eBPF Maintainers
Daniel Borkmann, Alexei Starovoitov
Cilium Team
André Martins, Jarno Rajahalme, Joe Stringer,
John Fastabend, Maciej Kwiek, Martynas
Pumputis, Paul Chaignon, Quentin Monnet,
Ray Bejjani, Tobias Klauser
Facebook Team
Andrii Nakryiko, Andrey Ignatov, Jakub
Kicinski, Martin KaFai Lau, Roman Gushchin,
Song Liu, Yonghong Song
Google Team
Chenbo Feng, KP Singh, Lorenzo Colitti,
Maciej Żenczykowski, Stanislav Fomichev,
BCC & bpftrace
Alastair Robertson, Brendan Gregg, Brenden
Blanco
Kernel Team
Björn Töpel, David S. Miller, Edward Cree,
Jesper Brouer, Toke Høiland-Jørgensen
24
● BPF Getting Started Guide
BPF and XDP Reference Guide
● Cilium
github.com/cilium/cilium
● Twitter
@ciliumproject
● Contact the speaker
@tgraf__
All images: Pixabay