It's been three years since Netflix's Brendan Gregg described the Berkeley Packet Filter as "Superpowers for Linux". Since then there has been an explosion of capabilities and tools based on eBPF, so you've probably heard the term, but do you know what it is and how to use it? In this demo-rich talk we'll explore some of the powerful things we can do with this technology, especially in the context of containers.
2. eBPF
“Superpowers have finally come to Linux”
- Brendan Gregg, Netflix
“eBPF does to Linux what JavaScript does to HTML”
@lizrice
3. eBPF
Run code in the kernel
without having to write a kernel module
@lizrice
4. man bpf
The bpf() system call performs a range of operations related to extended
Berkeley Packet Filters. Extended BPF (or eBPF) is similar to the
original ("classic") BPF (cBPF) used to filter network packets.
For both cBPF and eBPF programs,
the kernel statically analyzes the programs before loading them, in
order to ensure that they cannot harm the running system.
eBPF extends cBPF in multiple ways, including the ability to call a
fixed set of in-kernel helper functions
and access shared data structures such as eBPF maps.
@lizrice
5. man bpf
eBPF programs can be written in a restricted C that is compiled (using the
clang compiler) into eBPF bytecode. Various features are omitted from this
restricted C, such as loops, global variables, variadic functions,
floating-point numbers, and passing structures as function arguments.
(limited) C eBPF bytecode
@lizrice
6. clang & LLVM
“The LLVM Project is a collection of modular and reusable compiler and
toolchain technologies. Despite its name, LLVM has little to do with
traditional virtual machines. The name "LLVM" itself is not an acronym; it is
the full name of the project.”
“Clang is an ‘LLVM native’ C/C++/Objective-C compiler, which aims to
deliver amazingly fast compiles”
llvm.org@lizrice
7. man bpf
The kernel contains a just-in-time (JIT) compiler that translates eBPF
bytecode into native machine code for better performance.
@lizrice
(limited) C eBPF bytecode machine code
9. bcc
“BCC makes BPF programs easier to write, with kernel instrumentation in C
(and includes a C wrapper around LLVM), and front-ends in Python and lua.”
github.com/iovisor/bcc@lizrice
11. #!/usr/bin/python
from bcc import BPF
prog = """
int my_prog(void *ctx) {
bpf_trace_printk("Hello worldn");
return 0;
}
"""
b = BPF(text=prog)
b.attach_kprobe(event="sys_clone", fn_name="my_prog")
b.trace_print()
Use strace to see
the system calls
17. eBPF maps
Maps are a generic data structure for storage of
different types of data. They allow sharing of
data between eBPF kernel programs, and also
between kernel and user-space applications.
Each map type has the following attributes:
* type
* maximum number of elements
* key size in bytes
* value size in bytes
BPF_MAP_TYPE_UNSPEC
BPF_MAP_TYPE_HASH
BPF_MAP_TYPE_ARRAY
BPF_MAP_TYPE_PROG_ARRAY
BPF_MAP_TYPE_PERF_EVENT_ARRAY
BPF_MAP_TYPE_PERCPU_HASH
BPF_MAP_TYPE_PERCPU_ARRAY
BPF_MAP_TYPE_STACK_TRACE
BPF_MAP_TYPE_CGROUP_ARRAY
BPF_MAP_TYPE_LRU_HASH
BPF_MAP_TYPE_LRU_PERCPU_HASH
BPF_MAP_TYPE_LPM_TRIE
BPF_MAP_TYPE_ARRAY_OF_MAPS
BPF_MAP_TYPE_HASH_OF_MAPS
BPF_MAP_TYPE_DEVMAP
BPF_MAP_TYPE_SOCKMAP
BPF_MAP_TYPE_CPUMAP
@lizrice
19. ELF object file
○ eBPF opcodes
○ eBPF maps
kernel
verifier
BPF vm
maps
user space
bpf() system calls
@lizrice
20. ELF object file
○ eBPF opcodes
○ eBPF maps
kernel
verifier
BPF vm
maps
user space
bpf() system calls
BPF_PROG_LOAD
BPF_MAP_CREATE
@lizrice
21. ELF object file
○ eBPF opcodes
○ eBPF maps
kernel
verifier
BPF vm
maps
user space
bpf() system calls
BPF_PROG_LOAD
BPF_MAP_CREATE
Attach BPF program to
event
@lizrice
22. ELF object file
○ eBPF opcodes
○ eBPF maps
kernel
verifier
BPF vm
maps
user space
bpf() system calls
BPF_PROG_LOAD
BPF_MAP_CREATE
Attach BPF program to
event
Read / write maps
BPF_MAP_GET_NEXT_KEY
BPF_MAP_LOOKUP_ELEM
BPF_MAP_UPDATE_ELEM
BPF_MAP_DELETE_ELEM
@lizrice
24. from bcc import BPF
from time import sleep
program = """
BPF_HASH(syscalls);
int hello(void *ctx) {
u64 counter = 0;
u64 key = 56;
u64 *p;
p = syscalls.lookup(&key);
if (p != 0) {
counter = *p;
}
counter++;
#!/usr/bin/python
syscalls.update(&key, &counter);
return 0;
}
"""
b = BPF(text=program)
b.attach_kprobe(event="sys_clone", fn_name="hello")
while True:
sleep(3)
for k,v in b["syscalls"].items():
print(k,v)
25. eBPF helper functions
These helpers are used by eBPF programs to interact with the system, or
with the context in which they work. For instance, they can be used to
print debugging messages, to get the time since the system was booted, to
interact with eBPF maps, or to manipulate network packets.
bpf_trace_printk()
bpf_map_*_elem()
bpf_get_current_pid_tgid()
...
github.com/iovisor/bpf-docs/blob/master/bpf_helpers.rst@lizrice
26. Verifier
Each eBPF program is a set of instructions that is safe to run until
its completion. An in-kernel verifier statically determines that the
eBPF program terminates and is safe to execute.
● No loops
● No bad pointer dereferences
● Restricted program size
● Always exits
@lizrice
See what happens if
you try to dereference
pointer without
checking it’s not NULL
27. “The eBPF
validator’s muse is
a fickle miscreant
with a very short
attention span”
- Jeff Dileo & Andy Olsen, NCC Group
@lizrice
30. A packet filter can drop packets
But you can’t drop / fail a function call
What an eBPF program can’t do
@lizrice
31. seccomp-bpf
Blacklisting / whitelisting system calls
e.g. Docker’s default seccomp profile
Uses (classic) BPF:
● Can’t dereference syscall arguments
● No eBPF maps to communicate with userspace
@lizrice
32. Landlock
In-development Linux Security Module
Like seccomp, but using eBPF
● Unprivileged process can set up its own sandbox (~ seccomp rules++)
● Configure on the fly using eBPF maps
● Cgroup aware
● Access to kernel objects, so eBPF code can make more granular decisions
@lizrice landlock.io
33. A few references
IO Visor Project - iovisor.org
Brendan Gregg - brendangregg.com
O’Reilly book “Linux Observability with BPF” - David
Calavera and Lorenzo Fontana
@lizrice