Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Understanding eBPF in a Hurry!

343 views

Published on

eBPF is an exciting new technology that is poised to transform Linux performance engineering. eBPF enables users to dynamically and programatically trace any kernel or user space code path, safely and efficiently. However, understanding eBPF is not so simple. The goal of this talk is to give audiences a fundamental understanding of eBPF, how it interconnects existing Linux tracing technologies, and provides a powerful aplatform to solve any Linux performance problem.

Published in: Software
  • Be the first to comment

Understanding eBPF in a Hurry!

  1. 1. Understanding eBPF in a Hurry! LinkedIn Performance Engineering Meetup June 2019 Ray Jenkins
  2. 2. Hi, I’m Ray @_rayjenkins github.com/rjenkins ray@segment.com
  3. 3. Let’s say you have a performance problem.
  4. 4. Examples ● A developer claims boxes have “slow” I/O ● Network connections are randomly terminated. ● Your service is crashing, you’re not sure why, maybe it getting OOM killed? ● You think some process might be getting starved.
  5. 5. Someone suggests you might be able to solve it with eBPF.
  6. 6. Now you got two problems.
  7. 7. Goal: Can we understand what eBPF is and how it works?
  8. 8. http://www.brendangregg.com/ebpf.html This is our map
  9. 9. What is eBPF? (Extended Berkeley Packet Filter) ● Fast and safe, in-kernel, register based, bytecode VM. ● Designed to be JITed with direct mapping to x86_64 and other modern architectures. ● eBPF programs are “attached” to code paths within the kernel or user space programs and are executed when the code path is traversed. ● Linux Kernel 3.18 (2014) - bpf(2) syscall ○ (4.1 for Kprobes)
  10. 10. What is eBPF? … cont. ● Programs are written in restricted C. eBPF backend for LLVM/Clang. ○ clang -O2 -emit-llvm -c bpf.c -o - | llc -march=bpf -filetype=obj -o bpf.o ● eBPF Verifier ○ Verified to finish (no loops), no unreachable instructions, reads to uninitialized registers, or memory access to arbitrary pointers restricted kernel func calls and data structure access. ● eBPF Maps / Perf Events Ring Buffer ○ Memory-Mapped, bi-directional data structures for storage. Allow sharing of data between eBPF kernel programs, and also between kernel and user-space applications. ● Helper Functions ○ Kernel functions exposed to eBPF programs. ○ Context sensitive to type of eBPF program.
  11. 11. https://github.com/iovisor/bcc/blob/master/docs/kernel-versions.md
  12. 12. Why do we need eBPF?
  13. 13. Dynamically and Programmatically Trace Kernel or User Space Functions and Events, Safely and Efficiently.
  14. 14. http://www.brendangregg.com/ebpf.html This is our map YOU ARE HERE
  15. 15. eBPF is appealing to different people for different reasons, but its power resides in what you can attach it to. For Performance Engineering we’re primarily interested in these hooks. ● Kprobes/Uprobes ● Tracepoints ● USDT ● PerfEvents https://elixir.bootlin.com/linux/latest/source/include/uapi/linux/bpf.h#L145
  16. 16. Tracepoints (2.6.32) - 2009 ● Static places in the kernel where tracing is inserted. ● $ grep -ri TRACE_EVENT * ● https://github.com/brendangregg/perf-tools
  17. 17. K/J(ret)probes (2.6.9) - 2004 / U(ret)probes 3.15 - (2014) ● Probe any instruction, dynamically ● grep <func> /proc/kallsyms ● Register kprobes copies instruction, inserts breakpoint. (int3 on x86_64) ● Cpu hits breakpoints, trap occurs, registers saved and control passed to Kprobe. ● Pre-handler function called, Kprobes single steps instructions (Slow), Post-Handler called. ● CONFIG_OPTPROBES=Y (enabled on x86_64)
  18. 18. https://vjordan.info/log/fpga/how-linux-kprobes-works.html
  19. 19. https://vjordan.info/log/fpga/how-linux-kprobes-works.html
  20. 20. Perf events (2.6.31) - 2009 ● The “nearly un-googleable” - http://web.eece.maine.edu/~vweaver/projects/perf_events/ ● Trace and count tracepoints and lower level events, PMU, HW events (L1 cache store/load/miss etc). ● Accesses data from user space efficiently by accessing the perf_events ring buffer.
  21. 21. USDT (BCC March 2016) ● Userland Statically Defined Tracepoints ● sudo ./tplist -l <library name>
  22. 22. http://www.brendangregg.com/ebpf.html This is our map YOU ARE HERE
  23. 23. sudo apt-get install bpfcc- tools
  24. 24. Single Purpose Tools
  25. 25. Multi-Purpose Tools
  26. 26. So what does it look like?
  27. 27. https://github.com/torvalds/linux/blob/master/samples/bpf/sock_example.c
  28. 28. Ayyy, lol 😂 jk
  29. 29. https://github.com/iovisor/bcc https://github.com/iovisor/gobpf BPF Compiler Collection (BCC) Python, Lua, Golang
  30. 30. Let’s Talk about the VM, First Let’s Check our Map
  31. 31. YOU ARE IN 1992
  32. 32. https://www.tcpdump.org/papers/bpf-usenix93.pdf
  33. 33. tcpdump -ni eth0 ip and udp
  34. 34. tcpdump -ni eth0 ip and udp -d
  35. 35. tcpdump libpcap bpf Userspace Kernel tcp and udp bytecode packets packets
  36. 36. BPF - Berkeley Packet Filter ● Bytecode, register based VM, with a limited instruction set ● Runs in-kernel, designed for fast packet filtering ● 32-bit instructions (LOAD, STORE, ALU, BRANCH, RETURN) ● 2, 32-bit registers (A, X), hidden frame pointer
  37. 37. Bpf bytecode for ‘tcpdump ip and udp’ (000) ldh [12] (load 2 bytes from packet, at offset 12) (001) jeq #0x800 jt 2 jf 5 (002) ldb [23] (load byte at offset 23) (003) jeq #0x11 jt 4jf 5 (0x11 == 17) (004) ret #262144 (005) ret #0 https://blog.cloudflare.com/bpf-the-forgotten-bytecode/ http://www.networksorcery.com/enp/protocol/ip.htm
  38. 38. http://www.brendangregg.com/ebpf.html This is our map YOU ARE HERE
  39. 39. eBPF - Extended Berkeley Packet Filter ● Bytecode, register based VM, with a extended instruction set ○ Designed to be JITed with direct mapping to x86_64 ● 64-bit instructions, and 10 64-bit registers ○ R0 - return value from in-kernel function, and exit value for eBPF program ○ R1 - R5 - arguments from eBPF program to in-kernel function ○ R6 - R9 - callee saved registers that in-kernel function will preserve ○ R10 - read-only frame pointer to access stack ● BPF_CALL ○ hw register zero overhead calls to other kernel functions ● BPF_MAPS ○ Bi-directional data structures for storage. Allow sharing of data between eBPF kernel programs, and also between kernel and user-space applications. ● Helper Functions ○ https://github.com/iovisor/bcc/blob/master/docs/reference_guide.md ← Very Important!
  40. 40. eBPF - Extended Berkeley Packet Filter… cont ● Load programs via bpf(2) syscall (see: man bpf) ○ int bpf(int cmd, union bpf_attr *attr, unsigned int size); ● Cmd: BPF_PROG_LOAD ○ Verify and load an eBPF program, returning a new file descriptor associated with the program. The close-on-exec file descriptor flag (see fcntl(2)) is automatically enabled for the new file descriptor.
  41. 41. Can we learn more about eBPF VM like we did with tcpdump?
  42. 42. http://www.brendangregg.com/ebpf.html This is our map YOU ARE HERE
  43. 43. https://github.com/iovisor/bpf-docs/blob/master/eBPF.md
  44. 44. 0xb7 r1 imm: 72=114, 6c=108,64=100, (op) (dst) 0a=10 imm->ascii=”rldn”
  45. 45. 0x63 r1 r10 offset (op) (src) (dst)
  46. 46. 0x18 r1 imm (op) (dst) “hello wo”
  47. 47. As you can imagine the next 4 instructions copy the “hello wo” into a scratch space at offset -16. Copy a “0” into r1 and then copies “0” at offset -4. Finally we copy the address of the variable from the frame pointer at r10 into r1.
  48. 48. To prepare for the call to int bpf_trace_printk(const char *fmt, u32 fmt_size, ...) We need to point r1 to the variable (which is -16 bytes from the frame pointer) and in r2, we store the size of “hello worldn0” = 13 bytes.
  49. 49. 0x85 Is a function call, with an imm of 6. We need to look that up in bpf.h in order to figure out what that is.
  50. 50. 0 1 2 3 4 5 6
  51. 51. Lastly we set our return value in r0 = 0 and exit with opcode 0x95.
  52. 52. http://www.brendangregg.com/ebpf.html This is our map YOU ARE HERE
  53. 53. eBPF Maps
  54. 54. Helper Functions ● https://github.com/torvalds/linux/blob/master/include/uapi/linux/bpf.h ● https://github.com/iovisor/bcc/blob/master/docs/reference_guide.md ● int bpf_probe_read(void *dst, int size, const void *src) ← all reads must call ● int bpf_probe_read_str(void *dst, int size, const void *src) ● u64 bpf_ktime_get_ns(void) ● u64 bpf_get_current_pid_tgid(void) ● bpf_get_current_comm(char *buf, int size_of_buf) ● BPF_PERF_OUTPUT(name) ● int perf_submit((void *)ctx, (void *)data, u32 data_size) ● Map Functions ○ *val map.lookup(&key), val lookup_or_init(&key, &zero), delete(&key), update(&key, &val), map.increment(key[, increment_amount])
  55. 55. Segment Use Cases
  56. 56. segmentio/netsniff - tw: @julien_fabre / gh: @pryz
  57. 57. segmentio/ebpf ● Golang eBPF “Collectors”. ● CLI + ebpfd agent processes configuration and starts eBPF programs. ● Stats aggregation, publishing to observers, 3rd party stats forwarding (datadog etc.). ● Docker / pid -> container/service resolution.
  58. 58. segmentio/ebpf
  59. 59. Thank You! Questions?
  60. 60. References ● https://lwn.net/Articles/740157/ - A thorough introduction to eBPF ● https://lwn.net/Articles/599755/ - BPF: the universal in-kernel virtual machine ● https://www.collabora.com/news-and-blog/blog/2019/04/15/an-ebpf-overview-part-2-machine-and-bytecode/ ● https://www.youtube.com/watch?v=2lbtr85Yrs4 - Kernel Tracing with eBPF ● https://www.kernel.org/doc/Documentation/networking/filter.txt - Linux Socket Filtering aka Berkeley Packet Filter ● http://www.brendangregg.com/ebpf.html - Linux Extended BPF (eBPF) Tracing Tools ● https://www.slideshare.net/vh21/meet-cutebetweenebpfandtracing - Meet cute between eBPF and tracing ● https://blog.cloudflare.com/bpf-the-forgotten-bytecode/ - BPF the forgotten bytecode ● https://www.oreilly.com/learning/using-linux-tracing-tools - Modern Linux Tracing Landscape ● https://lwn.net/Articles/742082/ - An introduction to the BPF Compiler Collection ● https://bolinfest.github.io/opensnoop-native/ - How I ended up writing opensnoop in pure C using eBPF ● https://lwn.net/Articles/753601/ - Using user-space tracepoints with BPF ● http://brendangregg.com/perf.html - Perf Examples

×