Advertisement

eBPF Trace from Kernel to Userspace

Mar. 7, 2016
Advertisement

More Related Content

Advertisement
Advertisement

eBPF Trace from Kernel to Userspace

  1. eBPF Trace from Kernel to Userspace Gary Lin SUSE Labs Software Engineer Technology Sharing Day 2016
  2. Tracer
  3. tick_nohz_idle_enter set_cpu_sd_state_idle up_write __tick_nohz_idle_enter ktime_get uprobe_mmap read_hpet vma_set_page_prot vma_wants_writenotify rcu_needs_cpu fput get_next_timer_interrupt _raw_spin_lock hrtimer_get_next_event _raw_spin_lock_irqsave _raw_spin_unlock_irqrestore syscall_trace_leave _raw_write_unlock_irqrestore __audit_syscall_exit path_put dput mntput up_write rax: 0x0000000000000000 rbx: 0xffff88012b5a5a28 rcx: 0xffff8800987c18e0 rdx: 0x0000000000000000 rsi: 0xffff88012b439f20 rdi: 0xffff88012b464628 rbp: 0xffff8800959e3d98
  4. kprobe Kernel Userspace uprobe /sys/kernel/debug/tracing/kprobe_events /sys/kernel/debug/tracing/uprobe_events
  5. eBPF
  6. BPF?
  7. Berkeley Packet Filter
  8. BPF No Red BPF Program
  9. The BSD Packet Filter: A New Architecture for User-level Packet Capture December 19, 1992
  10. SCO lawsuit, August 2003
  11. Old
  12. Stable
  13. BPF ASM ldh [12] jne #0x800, drop ldb [23] jneq #1, drop # get a random uint32 number ld rand mod #4 jneq #1, drop ret #-1 drop: ret #0
  14. BPF Bytecode struct sock_filter code[] = { { 0x28, 0, 0, 0x0000000c }, { 0x15, 0, 8, 0x000086dd }, { 0x30, 0, 0, 0x00000014 }, { 0x15, 2, 0, 0x00000084 }, { 0x15, 1, 0, 0x00000006 }, { 0x15, 0, 17, 0x00000011 }, { 0x28, 0, 0, 0x00000036 }, { 0x15, 14, 0, 0x00000016 }, { 0x28, 0, 0, 0x00000038 }, { 0x15, 12, 13, 0x00000016 }, ... };
  15. Virtual Machinekind of
  16. BPF JIT
  17. BPF Bytecode Native Machine Code BPF JIT
  18. $ find arch/ -name bpf_jit* arch/sparc/net/bpf_jit_comp.c arch/sparc/net/bpf_jit_asm.S arch/sparc/net/bpf_jit.h arch/arm/net/bpf_jit_32.c arch/arm/net/bpf_jit_32.h arch/arm64/net/bpf_jit_comp.c arch/arm64/net/bpf_jit.h arch/powerpc/net/bpf_jit_comp.c arch/powerpc/net/bpf_jit_asm.S arch/powerpc/net/bpf_jit.h arch/s390/net/bpf_jit_comp.c arch/s390/net/bpf_jit.S arch/s390/net/bpf_jit.h arch/mips/net/bpf_jit.c arch/mips/net/bpf_jit_asm.S arch/mips/net/bpf_jit.h arch/x86/net/bpf_jit_comp.c arch/x86/net/bpf_jit.S
  19. Stable and Efficient
  20. eBPF
  21. Extended BPF
  22. eBPF userspacekernel eBPF Program BPF_PROG_LOAD At most 4096 instructions
  23. Extended Registers eBPF Verifier eBPF Map Probe Event
  24. Extended Registers eBPF Verifier eBPF Map Probe Event
  25. Classic BPF: 32 bit Extended BPF: 64 bit
  26. Classic BPF: A, X (2) Extended BPF: R0 – R9 (10) R10 (read-only)
  27. For x86_64 JIT R0 → rax R1 → rdi R2 → rsi R3 → rdx R4 → rcx R5 → r8 R6 → rbx R7 → r13 R8 → r14 R9 → r15 R10 → rbp
  28. BPF Calling Convention ● R0 Return value from in-kernel function, and exit value for eBPF program ● R1 – R5 Arguments from eBPF program to in-kernel function ● R6 – R9 Callee saved registers that in-kernel function will preserve ● R10 Read-only frame pointer to access stack
  29. Extended Registers eBPF Verifier eBPF Map Probe Event
  30. Two-Step Verification
  31. Step 1 Directed Acyclic Graph Check
  32. Loops Unreachable Instructions
  33. Loops Unreachable Instructions
  34. Step 2 Simulate the Execution
  35. Read a never-written register Do arithmetic of two valid pointer Load/store registers of invalid types Read stack before writing data into stack
  36. Read a never-written register Do arithmetic of two valid pointer Load/store registers of invalid types Read stack before writing data into stack
  37. Extended Registers eBPF Verifier eBPF Map Probe Event
  38. eBPF userspacekernel User Program Map BPF_MAP_*
  39. eBPF Map Types ● BPF_MAP_TYPE_HASH ● BPF_MAP_TYPE_ARRAY ● BPF_MAP_TYPE_PROG_ARRAY ● BPF_MAP_TYPE_PERF_EVENT_ARRAY
  40. eBPF Map Syscalls ● BPF_MAP_CREATE ● BPF_MAP_LOOKUP_ELEM ● BPF_MAP_UPDATE_ELEM ● BPF_MAP_DELETE_ELEM ● BPF_MAP_GET_NEXT_KEY
  41. Extended Registers eBPF Verifier eBPF Map Probe Event
  42. New ioctl request PERF_EVENT_IOC_SET_BPF
  43. Kprobe
  44. BPF_PROG_LOAD User Program eBPF userspace kernel Kernel Program kprobe Event fd fd PERF_EVENT_IOC_SET_BPF fd Attach
  45. Registration perf_tp_event_init() kernel/events/core.c perf_trace_init() kernel/trace/trace_event_perf.c perf_trace_event_init() kernel/trace/trace_event_perf.c perf_trace_event_reg() kernel/trace/trace_event_perf.c ret = tp_event->class->reg(tp_event, TRACE_REG_PERF_REGISTER, NULL); kprobe_register() kernel/trace/trace_kprobe.c enable_trace_kprobe() kernel/trace/trace_kprobe.c enable_kprobe() kernel/kprobes.c
  46. Attach perf_ioctl() kernel/events/core.c _perf_ioctl() kernel/events/core.c case PERF_EVENT_IOC_SET_BPF: return perf_event_set_bpf_prog(event, arg); perf_event_set_bpf_prog() kernel/events/core.c prog = bpf_prog_get(prog_fd); event->tp_event->prog = prog;
  47. Dispatch Event kprobe_dispatcher() kernel/trace/trace_kprobe.c kprobe_perf_func() kernel/trace/trace_kprobe.c if (prog && !trace_call_bpf(prog, regs)) Return; trace_call_bpf() kernel/trace/bpf_trace.c BPF_PROG_RUN() include/linux/filter.h __bpf_prog_run() kernel/bpf/core.c
  48. kfree_skb(struct sk_buff *skb) { if (unlikely(!skb)) return; …. } kprobe eBPF BPF bytecode Read Map BPF bytecode Map BPF_PROG_LOAD BPF_MAP_* userspace kernel bpf_tracer.c
  49. Uprobe
  50. BPF_PROG_LOAD User Program eBPF userspace kernel Kernel Program uprobe Event fd fd PERF_EVENT_IOC_SET_BPF fd Attach
  51. __libc_malloc(size_t *bytes) { arena_lookup(ar_ptr); arena_lock(ar_ptr, bytes); …. } uprobe eBPF BPF bytecode BPF bytecode userspace kernel bpf_tracer.c glibc
  52. How to use eBPF?
  53. Linux Kernel >= 4.1
  54. Kernel Config ● CONFIG_BPF=y ● CONFIG_BPF_SYSCALL=y ● CONFIG_BPF_JIT=y ● CONFIG_HAVE_BPF_JIT=y ● CONFIG_BPF_EVENTS=y
  55. BPF ASM
  56. BPF ASM Restricted C
  57. LLVM >= 3.7
  58. clang: llc: --emit-llvm --march=bpf
  59. C code LLVM IR Bitcode BPF Bytecodeclang llc
  60. User Program eBPF userspace kernel eBPF MAP Kernel Program As simple as possible Whatever you want
  61. BPF Compiler Collection
  62. obs://Base:System/bcc
  63. C & Python Library Built-in BPF compiler
  64. Hello World from bcc import BPF bpf_prog=""" void kprobe__sys_clone(void *ctx) { bpf_trace_printk(“Hello, Worldn”); } """ BPF(text=bpf_prog).trace_print()
  65. Access Map In bitehist.c: BPF_HISTOGRAM(dist); dist.increment(bpf_log2l(req->__data_len / 1024)); In bitehist.py: b = BPF(src_file = "bitehist.c") b["dist"].print_log2_hist("kbytes")
  66. Access Map (Cont’) # ./bitehist.py Tracing... Hit Ctrl-C to end. ^C kbytes : count distribution 0 -> 1 : 8 |****** | 2 -> 3 : 0 | | 4 -> 7 : 51 |****************************************| 8 -> 15 : 8 |****** | 16 -> 31 : 1 | | 32 -> 63 : 3 |** | 64 -> 127 : 2 |* |
  67. memleak.py if not kernel_trace: print("Attaching to malloc and free in pid %d," "Ctrl+C to quit." % pid) bpf_program.attach_uprobe(name="c", sym="malloc", fn_name="alloc_enter", pid=pid) bpf_program.attach_uretprobe(name="c", sym="malloc", fn_name="alloc_exit", pid=pid) bpf_program.attach_uprobe(name="c", sym="free", fn_name="free_enter", pid=pid) else: print("Attaching to kmalloc and kfree, Ctrl+C to quit.") bpf_program.attach_kprobe(event="__kmalloc", fn_name="alloc_enter") bpf_program.attach_kretprobe(event="__kmalloc", fn_name="alloc_exit") bpf_program.attach_kprobe(event="kfree", fn_name="free_enter")
  68. memleak.py (alloc_enter) BPF_HASH(sizes, u64); BPF_HASH(allocs, u64, struct alloc_info_t); int alloc_enter(struct pt_regs *ctx, size_t size) { ... u64 pid = bpf_get_current_pid_tgid(); u64 size64 = size; sizes.update(&pid, &size64); ... }
  69. memleak.py (alloc_exit) BPF_HASH(sizes, u64); BPF_HASH(allocs, u64, struct alloc_info_t); int alloc_exit(struct pt_regs *ctx) { u64 address = ctx->ax; u64 pid = bpf_get_current_pid_tgid(); u64* size64 = sizes.lookup(&pid); struct alloc_info_t info = {0}; if (size64 == 0) return 0; // missed alloc entry info.size = *size64; sizes.delete(&pid); info.timestamp_ns = bpf_ktime_get_ns(); info.num_frames = grab_stack(ctx, &info) - 2; allocs.update(&address, &info); ... }
  70. memleak.py (free) BPF_HASH(sizes, u64); BPF_HASH(allocs, u64, struct alloc_info_t); int free_enter(struct pt_regs *ctx, void *address) { u64 addr = (u64)address; struct alloc_info_t *info = allocs.lookup(&addr); if (info == 0) return 0; allocs.delete(&addr); ... }
  71. Demo
  72. Question?
  73. Thank You
  74. References ● Documentation/networking/filter.txt ● http://www.brendangregg.com/blog/2015-05-15/ebpf-one-small-s tep.html ● https://suchakra.wordpress.com/2015/05/18/bpf-internals-i/ ● https://suchakra.wordpress.com/2015/08/12/bpf-internals-ii/ ● https://lkml.org/lkml/2013/9/30/627 ● https://lwn.net/Articles/612878/ ● https://lwn.net/Articles/650953/ ● https://github.com/iovisor/bcc
Advertisement