Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

BPF / XDP 8월 세미나 KossLab

1,190 views

Published on

Understanding BPF / XDP

Published in: Technology
  • 훌륭한 자료 감사합니다. 강의 영상도 있으면 좋겠네요 ㅜㅜ
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

BPF / XDP 8월 세미나 KossLab

  1. 1. Linux Kernel - BPF / XDP KossLab 유태희, 송태웅
  2. 2. BPF 란 ? 1. 1. Berkeley Packet Filter since 1992 2. 2. Kernel Infrastructure
  3. 3. BPF 란 ? 1. Berkeley Packet Filter since 1992 1. 2. Kernel Infrastructure a. - Interpreter in-kernel virtual machine - Hook points in-kernel callback point - Map - Helper
  4. 4. BPF 란 ? “Safe dynamic programs and tools” "런타임중 안전하게 커널코드를 삽입하는 기술"
  5. 5. BPF Infrastructure: 안전한 code injection 작전 1) Native 머신코드 대신 BPF instruction 을 활용하자 2) Verifier 를 통해 위험요소를 미리검사하자 3) (기존)커널함수가 필요할때 Helper 함수를 통해서만 호출하자
  6. 6. BPF Infrastructure: 안전한 code injection 작전 1) Native 머신코드 대신 BPF instruction 을 활용하자
  7. 7. BPF Infrastructure: 안전한 code injection 작전 2) Verifier 를 통해 위험요소를 미리검사하자
  8. 8. BPF Infrastructure: 안전한 code injection 작전 3) (기존)커널함수가 필요할때 Helper 함수를 통해서만 호출하자
  9. 9. BPF Infrastructure: 안전한 code injection 위한 기반기술 Kernel += BPF Interpreter in-kernel virtual machine + Verifier + BPF Helper 함수 추가 leveraging kernel func + BPF syscall prog/map: loading & attaching 등
  10. 10. 1) 주니어 x86 Instruction set ’simplified x86’ (참고: PLUMgrind의 x86 bytecode verifier 실패) 2) BPF = classic BPF:10% + x86:70% + arm64:25% + risc:5% 3) Instruction encoding 사이즈 고정 (for high interpreter speed) 4) 간소화 -> 위험을 예측하고 예방하기 수월 (Verifier를 통한 loop, memory access 범위 점검 등) 5) Architecture-independent BPF Instruction set:
  11. 11. BPF Instruction set: immediate:32 offset:16 src:4 dst:4 opcode:8 $ cat include/uapi/linux/bpf.h [...] struct bpf_insn { __u8 code; /* opcode */ __u8 dst_reg:4; /* dest register */ __u8 src_reg:4; /* source register */ __s16 off; /* signed offset */ __s32 imm; /* signed immediate constant */ }; [...]
  12. 12. BPF Instruction set: immediate:32 offset:16 src:4 dst:4 opcode:8 class:4 + fields:4 + fields:4 eBPF: include/uapi/linux/bpf.h cBPF: include/uapi/linux/bpf_common.h
  13. 13. BPF Instruction set: immediate:32 offset:16 src:4 dst:4 opcode:8 class:4 + LD/ST fields:4 + ALU/JUM fields:4 eBPF: include/uapi/linux/bpf.h cBPF: include/uapi/linux/bpf_common.h LD/ST 계열: 0x00 ~ 0x03 ALU/JMP 계열: 0x04 ~ 0x07
  14. 14. BPF Instruction set: immediate:32 offset:16 src:4 dst:4 opcode:8 class:4 + LD/ST fields:4 + ALU/JUM fields:4 eBPF: include/uapi/linux/bpf.h cBPF: include/uapi/linux/bpf_common.h LD/ST 계열: 0x00 ~ 0x03 ALU/JMP 계열: 0x04 ~ 0x07
  15. 15. BPF Instruction set: struct bpf_insn prog[] = { BPF_MOV64_REG(BPF_REG_6, BPF_REG_1), BPF_LD_ABS(BPF_B, ETH_HLEN + offsetof(struct iphdr, protocol) /* R0 = ip->proto */), BPF_STX_MEM(BPF_W, BPF_REG_10, BPF_REG_0, -4), /* *(u32 *)(fp - 4) = r0 */ BPF_MOV64_REG(BPF_REG_2, BPF_REG_10), BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -4), /* r2 = fp - 4 */ BPF_LD_MAP_FD(BPF_REG_1, map_fd), BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem), BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 2), BPF_MOV64_IMM(BPF_REG_1, 1), /* r1 = 1 */ BPF_RAW_INSN(BPF_STX | BPF_XADD | BPF_DW, BPF_REG_0, BPF_REG_1, 0, 0), /* xadd r0 += r1 */ BPF_MOV64_IMM(BPF_REG_0, 0), /* r0 = 0 */ BPF_EXIT_INSN(), }; https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git/tree/samples/bpf/sock_example.c
  16. 16. BPF Helper 함수: $ grep BPF_CALL kernel/bpf/helpers.c: BPF_CALL_2(bpf_map_lookup_elem, struct bpf_map *, map, void *, key) BPF_CALL_4(bpf_map_update_elem, struct bpf_map *, map, void *, key, [...] kernel/trace/bpf_trace.c: BPF_CALL_2(bpf_override_return, struct pt_regs *, regs, unsigned long, rc) BPF_CALL_3(bpf_probe_read, void *, dst, u32, size, const void *, unsafe_ptr) BPF_CALL_3(bpf_probe_write_user, void *, unsafe_ptr, const void *, src, BPF_CALL_5(bpf_trace_printk, char *, fmt, u32, fmt_size, u64, arg1, [...] net/core/filter.c: BPF_CALL_1(bpf_skb_get_pay_offset, struct sk_buff *, skb) BPF_CALL_3(bpf_skb_get_nlattr, struct sk_buff *, skb, u32, a, u32, x) [...]
  17. 17. BPF as a kernel subproject “Safe dynamic programs and tools” $ cat MAINTAINERS | grep -A 3 BPF BPF (Safe dynamic programs and tools) M: Alexei Starovoitov <ast@kernel.org> M: Daniel Borkmann <daniel@iogearbox.net> L: netdev@vger.kernel.org [...]
  18. 18. “Safe dynamic programs and tools” $ cat MAINTAINERS | grep -A 27 BPF BPF (Safe dynamic programs and tools) [...] F: arch/x86/net/bpf_jit* [...] F: kernel/bpf/ F: kernel/trace/bpf_trace.c [...] F: net/core/filter.c F: net/sched/act_bpf.c F: net/sched/cls_bpf.c [...] [...] F: samples/bpf/ F: tools/bpf/ F: tools/lib/bpf/ F: tools/testing/selftests/bpf/ BPF as a kernel subproject
  19. 19. $ cat MAINTAINERS | grep -A 27 BPF BPF (Safe dynamic programs and tools) [...] F: arch/x86/net/bpf_jit* [...] F: kernel/bpf/ F: kernel/trace/bpf_trace.c [...] F: net/core/filter.c F: net/sched/act_bpf.c F: net/sched/cls_bpf.c [...] [...] F: samples/bpf/ F: tools/bpf/ F: tools/lib/bpf/ F: tools/testing/selftests/bpf/ JIT 지원 arch: x86, arm, arm64 sparc, s390, powerpc, mips “Safe dynamic programs and tools” BPF as a kernel subproject
  20. 20. “Safe dynamic programs and tools” $ cat MAINTAINERS | grep -A 27 BPF BPF (Safe dynamic programs and tools) [...] F: arch/x86/net/bpf_jit* [...] F: kernel/bpf/ F: kernel/trace/bpf_trace.c [...] F: net/core/filter.c F: net/sched/act_bpf.c F: net/sched/cls_bpf.c [...] [...] F: samples/bpf/ F: tools/bpf/ F: tools/lib/bpf/ F: tools/testing/selftests/bpf/ BPF core: Syscall, Interpreter, Verifier, Generic Helpers, Maps, ... BPF as a kernel subproject
  21. 21. “Safe dynamic programs and tools” $ cat MAINTAINERS | grep -A 27 BPF BPF (Safe dynamic programs and tools) [...] F: arch/x86/net/bpf_jit* [...] F: kernel/bpf/ F: kernel/trace/bpf_trace.c [...] F: net/core/filter.c F: net/sched/act_bpf.c F: net/sched/cls_bpf.c [...] [...] F: samples/bpf/ F: tools/bpf/ F: tools/lib/bpf/ F: tools/testing/selftests/bpf/ Hook points, Specific Helpers ... For cBPF, ... BPF as a kernel subproject
  22. 22. “Safe dynamic programs and tools” $ cat MAINTAINERS | grep -A 27 BPF BPF (Safe dynamic programs and tools) [...] F: arch/x86/net/bpf_jit* [...] F: kernel/bpf/ F: kernel/trace/bpf_trace.c [...] F: net/core/filter.c F: net/sched/act_bpf.c F: net/sched/cls_bpf.c [...] [...] F: samples/bpf/ F: tools/bpf/ F: tools/lib/bpf/ F: tools/testing/selftests/bpf/ bpf loading(lib), bpf tool, test codes, samples, ... BPF as a kernel subproject
  23. 23. BPF Infrastructure: BPF프로그램 활용을 위한 지원 1) Hook points in-kernel callback point 2) Map user-to-kernel shared memory 3) helper를 통한 커널함수호출 leveraging 4) Object pinning /sys/fs/bpf/...
  24. 24. KERNEL SPACE bpf() SYSCALL BPF Controller 1 (User App) ip tc Map 1 (Shared memory) Map 2 (Shared memory) . . BPF BPF BPF func(): Helper func() func() func() BPF library in-iproute2 BPF Controller 2 (User App) . . . . . . BPF Architecture: BPF library: libbpf prog/map load, attach, control
  25. 25. XDP
  26. 26. iptables는 충분히 빠른가요?
  27. 27. iptables는 왜 느릴까요?
  28. 28. iptables의 정책을 튜닝해본적 있으신가요?
  29. 29. XDP (eXpress Data Path)
  30. 30. XDP == FAST PATH
  31. 31. NORMAL PATH TX APP L7 RX L3 input TC Ingress PREROUTING ROUTING TCP/UDP FORWARD ROUTING INPUT OUTPUT POSTROUTING TC egress L3 output L4 L3 L3 DD
  32. 32. NORMAL PATH TX APP L7 RX L3 input TC Ingress PREROUTING ROUTING TCP/UDP FORWARD ROUTING INPUT OUTPUT POSTROUTING TC egress L3 output L4 L3 L3 DD
  33. 33. XDP FAST PATH REDIRECT TX APP RX L7 L4 L3 DD BPF
  34. 34. Tutorial
  35. 35. 준비물 1. 컴파일 컴퓨터 1대 2. 테스트 컴퓨터 1대(x86추천) 3. 커널 소스코드 4. clang + llvm(컴파일러) 5. bpftool(bpf 프로그램 로더) 6. bpf를 지원하는 iproute2 패키지
  36. 36. clang + llvm 컴파일러
  37. 37. git.kernel.org 의 bpf tree https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git 커널 소스코드
  38. 38. bpftool https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git/tree/tools/bpf/bpftool BPF 프로그램 로더
  39. 39. iproute2 https://git.kernel.org/pub/scm/linux/kernel/git/dborkman/iproute2.git XDP 설정도구
  40. 40. kernel source code 및 bpf sample code samples/bpf 예제
  41. 41. kernel소스 내 sample code 분석 samples/bpf 예제(xdp_rxq_info_kern.c)
  42. 42. BPF 프로그램 컴파일 실습 samples/bpf 컴파일
  43. 43. $ mount bpffs /sys/fs/bpf -t bpf $ bpftool prog load ./xdp_rxq_info_kern.o /sys/fs/bpf/xdp 프로그램 로드
  44. 44. $ ls /sys/fs/bpf/ $ ./bpftool prog list $ ./bpftool prog dump xlated id X jited 프로그램 확인
  45. 45. $ ip link set dev lo xdp pin /sys/fs/bpf/xdp XDP프로그램 설정
  46. 46. $ ip link show dev lo XDP프로그램 설정 확인
  47. 47. $ ip link set dev lo xdp off $ rm /sys/fs/bpf/xdp XDP프로그램 설정 제거
  48. 48. iptables vs XDP
  49. 49. TEST NETWORK PC2 192.168.4.2 PC1 192.168.4.1 ICMP $ ping
  50. 50. iptables를 사용하여 패킷을 버리기 DROP
  51. 51. #PC2 $ ping 192.168.4.1 #PC1 $ iptables -A INPUT -s 192.168.4.2 -d 192.168.4.1 -p icmp -j DROP
  52. 52. NORMAL PATH TX APP L7 RX L3 input TC Ingress PREROUTING ROUTING TCP/UDP FORWARD ROUTING INPUT OUTPUT POSTROUTING TC egress L3 output L4 L3 L3 DD
  53. 53. NORMAL PATH TX APP L7 RX L3 input TC Ingress PREROUTING ROUTING TCP/UDP FORWARD ROUTING INPUT OUTPUT POSTROUTING TC egress L3 output L4 L3 L3 DD DROP
  54. 54. XDP를 사용하여 패킷을 버리기 DROP
  55. 55. $ ./bpftool prog load ./xdp_icmp.o /sys/fs/bpf/xdp_icmp $ ip link set dev lo xdp pin /sys/fs/bpf/xdp_icmp XDP프로그램 설정 제거
  56. 56. XDP GENERIC PATH TX APP L7 RX BPF TC Ingress PREROUTING ROUTING TCP/UDP FORWARD ROUTING INPUT OUTPUT POSTROUTING TC egress L3 output L4 L3 L3 DD DROP
  57. 57. BPF Tracing iptables path VS XDP path
  58. 58. netif_receive_skb_internal() ipt_do_table() DROP BPF Tracing: iptables - DROP case
  59. 59. netif_receive_skb_internal() ipt_do_table() Long time !! ~~ DROP BPF Tracing: iptables - DROP case
  60. 60. netif_receive_skb_internal() do_xdp_generic() DROP BPF Tracing: XDP - DROP case
  61. 61. netif_receive_skb_internal() do_xdp_generic() Short time !! ~~ DROP BPF Tracing: XDP - DROP case
  62. 62. netif_receive_skb_internal() ipt_do_table() do_xdp_generic() Short time !! ~~ BPF Tracing: iptables vs XDP - DROP case DROP DROP Long time !! ~~
  63. 63. BPF Tracing: iptables vs XDP - DROP case net/core/dev.c: static int netif_receive_skb_internal(struct sk_buff *skb) net/core/dev.c: int do_xdp_generic(struct bpf_prog *xdp_prog, struct sk_buff *skb) net/ipv4/netfilter/ip_tables.c: unsigned int ipt_do_table(struct sk_buff *skb, ...) DROP DROP
  64. 64. net/core/dev.c: static int netif_receive_skb_internal(struct sk_buff *skb) net/core/dev.c: int do_xdp_generic(struct bpf_prog *xdp_prog, struct sk_buff *skb) net/ipv4/netfilter/ip_tables.c: unsigned int ipt_do_table(struct sk_buff *skb, ...) BPF Tracing: iptables vs XDP - DROP case
  65. 65. net/core/dev.c: static int netif_receive_skb_internal(struct sk_buff *skb) net/core/dev.c: int do_xdp_generic(struct bpf_prog *xdp_prog, struct sk_buff *skb) net/ipv4/netfilter/ip_tables.c: unsigned int ipt_do_table(struct sk_buff *skb, ...) BPF Tracing: iptables vs XDP - DROP case BPF BPF Beginning point: BPF ATTACH !! BPF Return point: BPF ATTACH !! Return point: BPF ATTACH !!
  66. 66. net/core/dev.c: static int netif_receive_skb_internal(struct sk_buff *skb) net/core/dev.c: int do_xdp_generic(struct bpf_prog *xdp_prog, struct sk_buff *skb) net/ipv4/netfilter/ip_tables.c: unsigned int ipt_do_table(struct sk_buff *skb, ...) BPF Tracing: iptables vs XDP - DROP case BPF BPF BPFSEC("kprobe/netif_receive_skb_internal") int bpf_trace_receive_skb(struct pt_regs *ctx) { long skb_ptr = PT_REGS_PARM1(ctx); u64 start_time = bpf_ktime_get_ns(); bpf_map_update_elem(&tracing_map, &skb_ptr, &start_time, BPF_ANY); return 0; }
  67. 67. BPF BPF net/core/dev.c: static int netif_receive_skb_internal(struct sk_buff *skb) net/core/dev.c: int do_xdp_generic(struct bpf_prog *xdp_prog, struct sk_buff *skb) net/ipv4/netfilter/ip_tables.c: unsigned int ipt_do_table(struct sk_buff *skb, ...) BPF Tracing: iptables vs XDP - DROP case BPF SEC("kprobe/netif_receive_skb_internal") int bpf_trace_receive_skb(struct pt_regs *ctx) { long skb_ptr = PT_REGS_PARM1(ctx); u64 start_time = bpf_ktime_get_ns(); bpf_map_update_elem(&tracing_map, &skb_ptr, &start_time, BPF_ANY); return 0; }
  68. 68. BPF BPF net/core/dev.c: static int netif_receive_skb_internal(struct sk_buff *skb) net/core/dev.c: int do_xdp_generic(struct bpf_prog *xdp_prog, struct sk_buff *skb) net/ipv4/netfilter/ip_tables.c: unsigned int ipt_do_table(struct sk_buff *skb, ...) BPF Tracing: iptables vs XDP - DROP case BPF SEC("kretprobe/do_xdp_generic") int bpf_trace_xdp_drop(struct pt_regs *ctx) { long skb_ptr = PT_REGS_PARM2(ctx); int action = PT_REGS_RC(ctx); if (action == XDP_DROP) { u64 *time = bpf_map_lookup_elem(&tracing_map, &skb_ptr); u64 cur_time = bpf_ktime_get_ns(); u64 delta = cur_time - tr->time; *time = delta; ...
  69. 69. BPF BPF net/core/dev.c: static int netif_receive_skb_internal(struct sk_buff *skb) net/core/dev.c: int do_xdp_generic(struct bpf_prog *xdp_prog, struct sk_buff *skb) net/ipv4/netfilter/ip_tables.c: unsigned int ipt_do_table(struct sk_buff *skb, ...) BPF Tracing: iptables vs XDP - DROP case BPF SEC("kretprobe/do_xdp_generic") int bpf_trace_xdp_drop(struct pt_regs *ctx) { long skb_ptr = PT_REGS_PARM2(ctx); int action = PT_REGS_RC(ctx); if (action == XDP_DROP) { u64 *time = bpf_map_lookup_elem(&tracing_map, &skb_ptr); u64 cur_time = bpf_ktime_get_ns(); u64 delta = cur_time - tr->time; *time = delta; ...
  70. 70. BPF BPF net/core/dev.c: static int netif_receive_skb_internal(struct sk_buff *skb) net/core/dev.c: int do_xdp_generic(struct bpf_prog *xdp_prog, struct sk_buff *skb) net/ipv4/netfilter/ip_tables.c: unsigned int ipt_do_table(struct sk_buff *skb, ...) BPF Tracing: iptables vs XDP - DROP case BPF SEC("kretprobe/ipt_do_table") int bpf_trace_iptables_drop(struct pt_regs *ctx) { long skb_ptr = PT_REGS_PARM1(ctx); int action = PT_REGS_RC(ctx); if (action == NF_DROP) { u64 *time = bpf_map_lookup_elem(&tracing_map, &skb_ptr); u64 cur_time = bpf_ktime_get_ns(); u64 delta = cur_time - tr->time; *time = delta; ...
  71. 71. BPF BPF net/core/dev.c: static int netif_receive_skb_internal(struct sk_buff *skb) net/core/dev.c: int do_xdp_generic(struct bpf_prog *xdp_prog, struct sk_buff *skb) net/ipv4/netfilter/ip_tables.c: unsigned int ipt_do_table(struct sk_buff *skb, ...) BPF Tracing: iptables vs XDP - DROP case BPF SEC("kretprobe/ipt_do_table") int bpf_trace_iptables_drop(struct pt_regs *ctx) { long skb_ptr = PT_REGS_PARM1(ctx); int action = PT_REGS_RC(ctx); if (action == NF_DROP) { u64 *time = bpf_map_lookup_elem(&tracing_map, &skb_ptr); u64 cur_time = bpf_ktime_get_ns(); u64 delta = cur_time - tr->time; *time = delta; ...
  72. 72. Ftrace Tracing iptables path VS XDP path
  73. 73. $ cat /sys/kernel/debug/tracing/trace netif_receive_skb_internal() { ktime_get_with_offset(); __netif_receive_skb() { __netif_receive_skb_core() { ip_rcv() { pskb_trim_rcsum_slow(); nf_hook_slow() { iptable_mangle_hook() { ipt_do_table() { __local_bh_enable_ip(); } } } ip_rcv_finish() { udp_v4_early_demux(); ip_route_input_noref() { ip_route_input_rcu() { ip_route_input_slow() { fib_table_lookup(); fib_validate_source() { __fib_validate_source() { fib_table_lookup(); } } } } } ip_local_deliver() { nf_hook_slow() { iptable_mangle_hook() { ipt_do_table() { __local_bh_enable_ip(); } } iptable_filter_hook() { ipt_do_table() { udp_mt(); __local_bh_enable_ip(); } } kfree_skb()
  74. 74. $ cat /sys/kernel/debug/tracing/trace netif_receive_skb_internal() { ktime_get_with_offset(); __netif_receive_skb() { __netif_receive_skb_core() { ip_rcv() { pskb_trim_rcsum_slow(); nf_hook_slow() { iptable_mangle_hook() { ipt_do_table() { __local_bh_enable_ip(); } } } ip_rcv_finish() { udp_v4_early_demux(); ip_route_input_noref() { ip_route_input_rcu() { ip_route_input_slow() { fib_table_lookup(); fib_validate_source() { __fib_validate_source() { fib_table_lookup(); } } } } } ip_local_deliver() { nf_hook_slow() { iptable_mangle_hook() { ipt_do_table() { __local_bh_enable_ip(); } } iptable_filter_hook() { ipt_do_table() { udp_mt(); __local_bh_enable_ip(); } } kfree_skb() DROP
  75. 75. $ cat /sys/kernel/debug/tracing/trace netif_receive_skb_internal() { ktime_get_with_offset(); __netif_receive_skb() { __netif_receive_skb_core() { ip_rcv() { pskb_trim_rcsum_slow(); nf_hook_slow() { iptable_mangle_hook() { ipt_do_table() { __local_bh_enable_ip(); } } } ip_rcv_finish() { udp_v4_early_demux(); ip_route_input_noref() { ip_route_input_rcu() { ip_route_input_slow() { fib_table_lookup(); fib_validate_source() { __fib_validate_source() { fib_table_lookup(); } } } } } ip_local_deliver() { nf_hook_slow() { iptable_mangle_hook() { ipt_do_table() { __local_bh_enable_ip(); } } iptable_filter_hook() { ipt_do_table() { udp_mt(); __local_bh_enable_ip(); } } kfree_skb() netif_receive_skb_internal() { ktime_get_with_offset(); do_xdp_generic() { pskb_expand_head() { __kmalloc_reserve.isra.48() { __kmalloc_node_track_caller() { kmalloc_slab(); should_failslab(); } } ksize(); skb_free_head() { page_frag_free(); } skb_headers_offset_update(); } __bpf_prog_run32() { ___bpf_prog_run(); } kfree_skb() DROP DROP
  76. 76. $ cat /sys/kernel/debug/tracing/trace netif_receive_skb_internal() { ktime_get_with_offset(); __netif_receive_skb() { __netif_receive_skb_core() { ip_rcv() { pskb_trim_rcsum_slow(); nf_hook_slow() { iptable_mangle_hook() { ipt_do_table() { __local_bh_enable_ip(); } } } ip_rcv_finish() { udp_v4_early_demux(); ip_route_input_noref() { ip_route_input_rcu() { ip_route_input_slow() { fib_table_lookup(); fib_validate_source() { __fib_validate_source() { fib_table_lookup(); } } } } } ip_local_deliver() { nf_hook_slow() { iptable_mangle_hook() { ipt_do_table() { __local_bh_enable_ip(); } } iptable_filter_hook() { ipt_do_table() { udp_mt(); __local_bh_enable_ip(); } } kfree_skb() netif_receive_skb_internal() { ktime_get_with_offset(); do_xdp_generic() { pskb_expand_head() { __kmalloc_reserve.isra.48() { __kmalloc_node_track_caller() { kmalloc_slab(); should_failslab(); } } ksize(); skb_free_head() { page_frag_free(); } skb_headers_offset_update(); } __bpf_prog_run32() { ___bpf_prog_run(); } kfree_skb() DROP DROP
  77. 77. $ cat /sys/kernel/debug/tracing/trace netif_receive_skb_internal() { ktime_get_with_offset(); __netif_receive_skb() { __netif_receive_skb_core() { ip_rcv() { pskb_trim_rcsum_slow(); nf_hook_slow() { iptable_mangle_hook() { ipt_do_table() { __local_bh_enable_ip(); } } } ip_rcv_finish() { udp_v4_early_demux(); ip_route_input_noref() { ip_route_input_rcu() { ip_route_input_slow() { fib_table_lookup(); fib_validate_source() { __fib_validate_source() { fib_table_lookup(); } } } } } ip_local_deliver() { nf_hook_slow() { iptable_mangle_hook() { ipt_do_table() { __local_bh_enable_ip(); } } iptable_filter_hook() { ipt_do_table() { udp_mt(); __local_bh_enable_ip(); } } kfree_skb() netif_receive_skb_internal() { ktime_get_with_offset(); do_xdp_generic() { pskb_expand_head() { __kmalloc_reserve.isra.48() { __kmalloc_node_track_caller() { kmalloc_slab(); should_failslab(); } } ksize(); skb_free_head() { page_frag_free(); } skb_headers_offset_update(); } __bpf_prog_run32() { ___bpf_prog_run(); } kfree_skb() DROP DROP YOU WIN !! “XDP is LOVE”
  78. 78. BPF internals
  79. 79. BPF Infrastructure: 1) Hook points in-kernel callback point 2) LOAD ATTACH CALLBACK 3) Verifier / Interpreter / JIT 4) Map user-to-kernel shared memory 5) helper를 통한 커널함수호출 leveraging 6) Object pinning /sys/fs/bpf/… ...
  80. 80. Hook points: callback points KERNEL SPACE XDP: L2 device driver 지점 tc: L3 DD 직전 / 직후 지점 kprobe: 함수 Entry / Return . . . . . .
  81. 81. Hook points: callback points KERNEL SPACE XDP: L2 device driver 지점 tc: L3 DD 직전 / 직후 지점 kprobe: 함수 Entry / Return . . . . . .if (has_bpf_prog) BPF_PROG_RUN(); ->bpf_func(ctx, insni); 특정 커널 함수 안에
  82. 82. Hook points: callback points KERNEL SPACE XDP: L2 device driver 지점 kprobe: 함수 Entry / Return . . . . . . BPF BPF BPF BPF prog injection !! tc: L3 DD 직전 / 직후 지점if (has_bpf_prog) BPF_PROG_RUN(); ->bpf_func(ctx, insni); 특정 커널 함수 안에
  83. 83. Hook points: callback points KERNEL SPACE XDP: L2 device driver 지점 kprobe: 함수 Entry / Return . . . . . . BPF BPF BPF BPF prog injection !! tc: L3 DD 직전 / 직후 지점if (has_bpf_prog) BPF_PROG_RUN(); ->bpf_func(ctx, insni); BPF Interpreter 또는 JIT 된 머신코드 특정 커널 함수 안에
  84. 84. Hook points: callback points KERNEL SPACE XDP: L2 device driver 지점 kprobe: 함수 Entry / Return . . . . . . BPF BPF BPF BPF prog injection !! HOW ? tc: L3 DD 직전 / 직후 지점
  85. 85. KERNEL SPACE tc ip BPF library in-iproute2 bpf() SYSCALL c소스 _kern.c clang / llc 컴파일 BPF 프로그램 Or BPF bytecode BPF elf
  86. 86. KERNEL SPACE tc ip BPF library in-iproute2 bpf() SYSCALL c소스 _kern.c clang / llc 컴파일 BPF 프로그램 Or BPF bytecode BPF elf 1. ELF parsing, 2. 1차 Relocation: 1) map fd 2) bpf to bpf call
  87. 87. KERNEL SPACE tc ip BPF library in-iproute2 bpf() SYSCALL c소스 _kern.c clang / llc 컴파일 BPF 프로그램 Or BPF bytecode BPF elf 1. ELF parsing, 2. 1차 Relocation: 1) map fd 2) bpf to bpf call Map 1 (Shared memory)
  88. 88. KERNEL SPACE tc ip BPF library in-iproute2 bpf() SYSCALL c소스 _kern.c clang / llc 컴파일 BPF 프로그램 Or BPF bytecode BPF elf 1. ELF parsing, 2. 1차 Relocation: 1) map fd 2) bpf to bpf call 3. Loading BPF BPF BPF prog injection !! BPF_PROG_LOAD Map 1 (Shared memory)
  89. 89. KERNEL SPACE tc ip BPF library in-iproute2 bpf() SYSCALL c소스 _kern.c clang / llc 컴파일 BPF 프로그램 Or BPF bytecode BPF elf 1. ELF parsing, 2. 1차 Relocation: 1) map fd 2) bpf to bpf call 3. Loading BPF BPF BPF prog injection !! BPF Controller (User App) BPF library: libbpf prog/map load, attach, control . . . Map 1 (Shared memory)
  90. 90. KERNEL SPACE tc ip BPF library in-iproute2 bpf() SYSCALL c소스 _kern.c clang / llc 컴파일 BPF 프로그램 Or BPF bytecode BPF elf 1. ELF parsing, 2. 1차 Relocation: 1) map fd 2) bpf to bpf call 3. Loading BPF BPF BPF prog injection !! BPF Controller (User App) BPF library: libbpf prog/map load, attach, control . . . HOW ? in bpf() Map 1 (Shared memory)
  91. 91. KERNEL SPACE tc ip BPF library in-iproute2 bpf() SYSCALL c소스 _kern.c clang / llc 컴파일 BPF 프로그램 Or BPF bytecode BPF elf 1. ELF parsing, 2. 1차 Relocation: 1) map fd 2) bpf to bpf call 3. Loading BPF Controller (User App) BPF library: libbpf prog/map load, attach, control . . . BPF LOAD 과정: 1. BPF prog / map alloc 2. Verifier (loop, mem access 범위) Map 1 (Shared memory)
  92. 92. KERNEL SPACE tc ip BPF library in-iproute2 bpf() SYSCALL c소스 _kern.c clang / llc 컴파일 BPF 프로그램 Or BPF bytecode BPF elf 1. ELF parsing, 2. 1차 Relocation: 1) map fd 2) bpf to bpf call 3. Loading BPF Controller (User App) BPF library: libbpf prog/map load, attach, control . . . BPF LOAD 과정: 1. BPF prog / map alloc 2. Verifier (loop, mem access 범위) 3. 2차 Relocation: 1) map fd → map ptr 2) helper ID → func addr Map 1 (Shared memory)
  93. 93. KERNEL SPACE tc ip BPF library in-iproute2 bpf() SYSCALL c소스 _kern.c clang / llc 컴파일 BPF 프로그램 Or BPF bytecode BPF elf 1. ELF parsing, 2. 1차 Relocation: 1) map fd 2) bpf to bpf call 3. Loading BPF Controller (User App) BPF library: libbpf prog/map load, attach, control . . . BPF LOAD 과정: 1. BPF prog / map alloc 2. Verifier (loop, mem access 범위) 3. 2차 Relocation: 1) map fd → map ptr 2) helper ID → func addr 4. select runtime: 1) BPF interpreter func addr 2) JIT 후 BPF func addr return fd; Map 1 (Shared memory)
  94. 94. KERNEL SPACE tc ip BPF library in-iproute2 bpf() SYSCALL c소스 _kern.c clang / llc 컴파일 BPF 프로그램 Or BPF bytecode BPF elf 1. ELF parsing, 2. 1차 Relocation: 1) map fd 2) bpf to bpf call 3. Loading BPF Controller (User App) BPF library: libbpf prog/map load, attach, control . . . BPF LOAD 과정: 1. BPF prog / map alloc 2. Verifier (loop, mem access 범위) 3. 2차 Relocation: 1) map fd → map ptr 2) helper ID → func addr 4. select runtime: 1) BPF interpreter func addr 2) JIT 후 BPF func addr if (has_bpf_prog) BPF_PROG_RUN(); ->bpf_func(ctx, insni); return fd;
  95. 95. KERNEL SPACE tc ip BPF library in-iproute2 bpf() SYSCALL c소스 _kern.c clang / llc 컴파일 BPF 프로그램 Or BPF bytecode BPF elf 1. ELF parsing, 2. 1차 Relocation: 1) map fd 2) bpf to bpf call 3. Loading BPF BPF BPF Controller (User App) BPF library: libbpf prog/map load, attach, control . . . Map 1 (Shared memory) Map 2 (Shared memory) . . . 다양한 BPF ATTACH 방식: - sock(), send() AF_NETLINK - bpf() syscall BPF_PROG_ATTACH BPF_RAW_TRACEPOINT_OPEN - kprobe event id, ioctl() PERF_EVENT_IOC_SET_BPF ...
  96. 96. KERNEL SPACE tc ip BPF library in-iproute2 bpf() SYSCALL BPF BPF BPF Controller (User App) BPF library: libbpf prog/map load, attach, control . . . Map 1 (Shared memory) Map 2 (Shared memory) . . . BPF Callback !! Callback !! BPF CALLBACK !!
  97. 97. KERNEL SPACE tc ip BPF library in-iproute2 bpf() SYSCALL BPF BPF BPF Controller (User App) BPF library: libbpf prog/map load, attach, control . . . Map 1 (Shared memory) Map 2 (Shared memory) . . . BPF func(): Helper func() func() func() BPF Helper 함수를 통한 커널함수 호출 leveraging !!
  98. 98. KERNEL SPACE tc ip BPF library in-iproute2 bpf() SYSCALL BPF BPF . . . Map 1 (Shared memory) Map 2 (Shared memory) . . . BPF func(): Helper func() func() func() BPF Controller 1 (User App) BPF library: libbpf prog/map load, attach, control BPF Controller 2 (User App) BPF map 을 통한 user to kernel memory shared
  99. 99. KERNEL SPACE bpf() SYSCALL BPF Controller 1 (User App) ip tc Map 1 (Shared memory) Map 2 (Shared memory) . . BPF BPF BPF func(): Helper func() func() func() BPF library in-iproute2 BPF Controller 2 (User App) . . . . . . BPF Architecture: BPF library: libbpf prog/map load, attach, control
  100. 100. XDP internals
  101. 101. XDP_ABORT XDP_DROP XDP_PASS XDP_TX XDP_REDIRECT XDP RETURN TYPE
  102. 102. XDP_REDIRECT XDP_TX XDP_PASS BPF APP XDP_DROP Network Device Driver
  103. 103. Generic XDP vs Driver XDP
  104. 104. XDP GENERIC PATH TX APP L7 RX BPF TC Ingress PREROUTING ROUTING TCP/UDP FORWARD ROUTING INPUT OUTPUT POSTROUTING TC egress L3 output L4 L3 L3 DD
  105. 105. XDP GENERIC PATH TX APP L7 RX BPF TC Ingress PREROUTING ROUTING TCP/UDP FORWARD ROUTING INPUT OUTPUT POSTROUTING TC egress L3 output L4 L3 L3 DD
  106. 106. DRIVER XDP PATH REDIRECT TX APP RX L7 L4 L3 L2 PASS BPF
  107. 107. DRIVER XDP PATH REDIRECT TX APP RX L7 L4 L3 L2 PASS BPF
  108. 108. Driver XDP vs Generic XDP REDIRECT TX RX PASS BPF REDIRECT TX RX L3 BPF PASS
  109. 109. XDP 자료구조와 SKB
  110. 110. xdp->data HEADROOM MAC HEADER IP HEADER TAIL/ TAILROOM END skb->data xdp->data_hard_start xdp->data_meta xdp_frame
  111. 111. DATA ACCESS 허용범위
  112. 112. xdp->data HEADROOM MAC HEADER IP HEADER TAIL/ TAILROOM END xdp->data_meta xdp->data_hard_start
  113. 113. xdp->data HEADROOM MAC HEADER IP HEADER TAIL/ TAILROOM END xdp->data_meta xdp->data_hard_start
  114. 114. XDP_REDIRECT분석
  115. 115. XDP_REDIRECTBPF APP eth0 eth1 eth2 eth3 XDP_TX REDIRECT MAP
  116. 116. XDP_REDIRECTBPF APP eth0 eth1 eth2 eth3 XDP_TX REDIRECT MAP
  117. 117. XDP_REDIRECTBPF APP eth0 eth1 eth2 eth3 XDP_TX REDIRECT MAP
  118. 118. bpf_redirect()통한 XDP_REDIRECT
  119. 119. bpf_redirect()에 대해
  120. 120. XDP_REDIRECTBPF APP eth0 eth1 eth2 eth3 XDP_TX REDIRECT MAP
  121. 121. XDP_REDIRECTBPF APP eth0 eth1 eth2 eth3 XDP_TX REDIRECT MAP
  122. 122. XDP_REDIRECT - bulkTX
  123. 123. bulkTX REDIRECT TX RX BPF xdp_frame xdp_frame xdp_frame xdp_frame xdp_frame xdp_frame xdp_frame xdp_frame map
  124. 124. DEVMAP
  125. 125. DEVMAP REDIRECT TX RX BPF xdp_frame DEVMAP redirect info bpf_redirect_map Key Value(Device) 0 X 1 X 2 X xdp_frame xdp_frame xdp_frame xdp_frame xdp_frame xdp_frame xdp_frame xdp_frame xdp_frame xdp_frame xdp_frame xdp_frame xdp_frame xdp_frame xdp_frame
  126. 126. CPUMAP
  127. 127. CPUMAP REDIRECT ??? RX BPF xdp_frame CPUMAP redirect info bpf_redirect_map Key Value(CPU) 0 X 1 X 2 X xdp_frame xdp_frame xdp_frame xdp_frame xdp_frame xdp_frame xdp_frame
  128. 128. CPUMAP REDIRECT netif_receive_skb_core RX BPF xdp_frame CPUMAP redirect info bpf_redirect_map Key Value(CPU) 0 X 1 X 2 X xdp_frame xdp_frame xdp_frame xdp_frame xdp_frame xdp_frame xdp_frame
  129. 129. GENERIC_XDP의 REDIRECT
  130. 130. BPFILTER
  131. 131. ● memory model switching ○ /net/core/xdp.c ● page pool ○ /net/core/page_pool ● offload ● AF_XDP && XSK(XDP SOCKET) ● helper functions ● Device Driver Additional Topics:
  132. 132. ● Verifier ○ CFG, DAG, register, memory check... ● Other types ○ TC, SOCKET FILTER, CGROUP ● BTF ○ ELFutils, clang -g, llc -mattr=dwarfris ● Tail call ○ bpf_prog_array 연관 Additional Topics:
  133. 133. ● FACEBOOK’s Katran ○ L4 Load-balancing ○ https://github.com/facebookincubator/katran ● Suricata ○ IPD/IDS engine ○ https://suricata-ids.org/ ● Cilium ○ https://cilium.io/ ● IOvisor bcc ○ https://www.iovisor.org/ ● IR Decoding ○ https://lwn.net/Articles/759188/ Additional Topics:

×