SlideShare a Scribd company logo
1 of 137
Download to read offline
Linux Kernel - BPF / XDP
KossLab 유태희, 송태웅
BPF 란 ?
1. 1. Berkeley Packet Filter since 1992
2. 2. Kernel Infrastructure
BPF 란 ?
1. Berkeley Packet Filter since 1992
1. 2. Kernel Infrastructure
a. - Interpreter in-kernel virtual machine
- Hook points in-kernel callback point
- Map
- Helper
BPF 란 ?
“Safe dynamic programs and tools”
"런타임중 안전하게 커널코드를 삽입하는 기술"
BPF Infrastructure:
안전한 code injection 작전
1) Native 머신코드 대신 BPF instruction 을 활용하자
2) Verifier 를 통해 위험요소를 미리검사하자
3) (기존)커널함수가 필요할때 Helper 함수를 통해서만 호출하자
BPF Infrastructure:
안전한 code injection 작전
1) Native 머신코드 대신 BPF instruction 을 활용하자
BPF Infrastructure:
안전한 code injection 작전
2) Verifier 를 통해 위험요소를 미리검사하자
BPF Infrastructure:
안전한 code injection 작전
3) (기존)커널함수가 필요할때 Helper 함수를 통해서만 호출하자
BPF Infrastructure:
안전한 code injection 위한 기반기술
Kernel += BPF Interpreter in-kernel virtual machine
+ Verifier
+ BPF Helper 함수 추가 leveraging kernel func
+ BPF syscall prog/map: loading & attaching 등
1) 주니어 x86 Instruction set ’simplified x86’
(참고: PLUMgrind의 x86 bytecode verifier 실패)
2) BPF = classic BPF:10% + x86:70% + arm64:25% + risc:5%
3) Instruction encoding 사이즈 고정
(for high interpreter speed)
4) 간소화 -> 위험을 예측하고 예방하기 수월
(Verifier를 통한 loop, memory access 범위 점검 등)
5) Architecture-independent
BPF Instruction set:
BPF Instruction set:
immediate:32 offset:16 src:4 dst:4 opcode:8
$ cat include/uapi/linux/bpf.h
[...]
struct bpf_insn {
__u8 code; /* opcode */
__u8 dst_reg:4; /* dest register */
__u8 src_reg:4; /* source register */
__s16 off; /* signed offset */
__s32 imm; /* signed immediate constant */
};
[...]
BPF Instruction set:
immediate:32 offset:16 src:4 dst:4 opcode:8
class:4 + fields:4
+ fields:4
eBPF: include/uapi/linux/bpf.h
cBPF: include/uapi/linux/bpf_common.h
BPF Instruction set:
immediate:32 offset:16 src:4 dst:4 opcode:8
class:4 + LD/ST fields:4
+ ALU/JUM fields:4
eBPF: include/uapi/linux/bpf.h
cBPF: include/uapi/linux/bpf_common.h
LD/ST 계열:
0x00 ~ 0x03
ALU/JMP 계열:
0x04 ~ 0x07
BPF Instruction set:
immediate:32 offset:16 src:4 dst:4 opcode:8
class:4 + LD/ST fields:4
+ ALU/JUM fields:4
eBPF: include/uapi/linux/bpf.h
cBPF: include/uapi/linux/bpf_common.h
LD/ST 계열:
0x00 ~ 0x03
ALU/JMP 계열:
0x04 ~ 0x07
BPF Instruction set:
struct bpf_insn prog[] = {
BPF_MOV64_REG(BPF_REG_6, BPF_REG_1),
BPF_LD_ABS(BPF_B, ETH_HLEN + offsetof(struct iphdr, protocol) /* R0 = ip->proto */),
BPF_STX_MEM(BPF_W, BPF_REG_10, BPF_REG_0, -4), /* *(u32 *)(fp - 4) = r0 */
BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -4), /* r2 = fp - 4 */
BPF_LD_MAP_FD(BPF_REG_1, map_fd),
BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 2),
BPF_MOV64_IMM(BPF_REG_1, 1), /* r1 = 1 */
BPF_RAW_INSN(BPF_STX | BPF_XADD | BPF_DW, BPF_REG_0, BPF_REG_1, 0, 0), /* xadd r0 += r1 */
BPF_MOV64_IMM(BPF_REG_0, 0), /* r0 = 0 */
BPF_EXIT_INSN(),
};
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git/tree/samples/bpf/sock_example.c
BPF Helper 함수:
$ grep BPF_CALL
kernel/bpf/helpers.c:
BPF_CALL_2(bpf_map_lookup_elem, struct bpf_map *, map, void *, key)
BPF_CALL_4(bpf_map_update_elem, struct bpf_map *, map, void *, key,
[...]
kernel/trace/bpf_trace.c:
BPF_CALL_2(bpf_override_return, struct pt_regs *, regs, unsigned long, rc)
BPF_CALL_3(bpf_probe_read, void *, dst, u32, size, const void *, unsafe_ptr)
BPF_CALL_3(bpf_probe_write_user, void *, unsafe_ptr, const void *, src,
BPF_CALL_5(bpf_trace_printk, char *, fmt, u32, fmt_size, u64, arg1,
[...]
net/core/filter.c:
BPF_CALL_1(bpf_skb_get_pay_offset, struct sk_buff *, skb)
BPF_CALL_3(bpf_skb_get_nlattr, struct sk_buff *, skb, u32, a, u32, x)
[...]
BPF as a kernel subproject
“Safe dynamic programs and tools”
$ cat MAINTAINERS | grep -A 3 BPF
BPF (Safe dynamic programs and tools)
M: Alexei Starovoitov <ast@kernel.org>
M: Daniel Borkmann <daniel@iogearbox.net>
L: netdev@vger.kernel.org
[...]
“Safe dynamic programs and tools”
$ cat MAINTAINERS | grep -A 27 BPF
BPF (Safe dynamic programs and tools)
[...]
F: arch/x86/net/bpf_jit*
[...]
F: kernel/bpf/
F: kernel/trace/bpf_trace.c
[...]
F: net/core/filter.c
F: net/sched/act_bpf.c
F: net/sched/cls_bpf.c
[...]
[...]
F: samples/bpf/
F: tools/bpf/
F: tools/lib/bpf/
F: tools/testing/selftests/bpf/
BPF as a kernel subproject
$ cat MAINTAINERS | grep -A 27 BPF
BPF (Safe dynamic programs and tools)
[...]
F: arch/x86/net/bpf_jit*
[...]
F: kernel/bpf/
F: kernel/trace/bpf_trace.c
[...]
F: net/core/filter.c
F: net/sched/act_bpf.c
F: net/sched/cls_bpf.c
[...]
[...]
F: samples/bpf/
F: tools/bpf/
F: tools/lib/bpf/
F: tools/testing/selftests/bpf/
JIT 지원 arch:
x86,
arm, arm64
sparc,
s390,
powerpc, mips
“Safe dynamic programs and tools”
BPF as a kernel subproject
“Safe dynamic programs and tools”
$ cat MAINTAINERS | grep -A 27 BPF
BPF (Safe dynamic programs and tools)
[...]
F: arch/x86/net/bpf_jit*
[...]
F: kernel/bpf/
F: kernel/trace/bpf_trace.c
[...]
F: net/core/filter.c
F: net/sched/act_bpf.c
F: net/sched/cls_bpf.c
[...]
[...]
F: samples/bpf/
F: tools/bpf/
F: tools/lib/bpf/
F: tools/testing/selftests/bpf/
BPF core:
Syscall,
Interpreter,
Verifier,
Generic Helpers,
Maps,
...
BPF as a kernel subproject
“Safe dynamic programs and tools”
$ cat MAINTAINERS | grep -A 27 BPF
BPF (Safe dynamic programs and tools)
[...]
F: arch/x86/net/bpf_jit*
[...]
F: kernel/bpf/
F: kernel/trace/bpf_trace.c
[...]
F: net/core/filter.c
F: net/sched/act_bpf.c
F: net/sched/cls_bpf.c
[...]
[...]
F: samples/bpf/
F: tools/bpf/
F: tools/lib/bpf/
F: tools/testing/selftests/bpf/
Hook points,
Specific Helpers
...
For cBPF, ...
BPF as a kernel subproject
“Safe dynamic programs and tools”
$ cat MAINTAINERS | grep -A 27 BPF
BPF (Safe dynamic programs and tools)
[...]
F: arch/x86/net/bpf_jit*
[...]
F: kernel/bpf/
F: kernel/trace/bpf_trace.c
[...]
F: net/core/filter.c
F: net/sched/act_bpf.c
F: net/sched/cls_bpf.c
[...]
[...]
F: samples/bpf/
F: tools/bpf/
F: tools/lib/bpf/
F: tools/testing/selftests/bpf/
bpf loading(lib),
bpf tool,
test codes,
samples,
...
BPF as a kernel subproject
BPF Infrastructure:
BPF프로그램 활용을 위한 지원
1) Hook points in-kernel callback point
2) Map user-to-kernel shared memory
3) helper를 통한 커널함수호출 leveraging
4) Object pinning /sys/fs/bpf/...
KERNEL SPACE
bpf() SYSCALL
BPF Controller 1
(User App)
ip tc
Map 1
(Shared memory)
Map 2
(Shared memory)
.
.
BPF
BPF
BPF
func(): Helper
func()
func()
func()
BPF library
in-iproute2
BPF Controller 2
(User App)
. . . . . .
BPF Architecture:
BPF library: libbpf
prog/map
load, attach, control
XDP
iptables는 충분히 빠른가요?
iptables는 왜 느릴까요?
iptables의 정책을 튜닝해본적 있으신가요?
XDP
(eXpress Data Path)
XDP == FAST PATH
NORMAL PATH
TX
APP
L7
RX
L3 input
TC Ingress
PREROUTING ROUTING
TCP/UDP
FORWARD ROUTING
INPUT OUTPUT
POSTROUTING
TC egress
L3 output
L4
L3
L3
DD
NORMAL PATH
TX
APP
L7
RX
L3 input
TC Ingress
PREROUTING ROUTING
TCP/UDP
FORWARD ROUTING
INPUT OUTPUT
POSTROUTING
TC egress
L3 output
L4
L3
L3
DD
XDP FAST PATH
REDIRECT
TX
APP
RX
L7
L4
L3
DD
BPF
Tutorial
준비물
1. 컴파일 컴퓨터 1대
2. 테스트 컴퓨터 1대(x86추천)
3. 커널 소스코드
4. clang + llvm(컴파일러)
5. bpftool(bpf 프로그램 로더)
6. bpf를 지원하는 iproute2 패키지
clang + llvm
컴파일러
git.kernel.org 의 bpf tree
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git
커널 소스코드
bpftool
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git/tree/tools/bpf/bpftool
BPF 프로그램 로더
iproute2
https://git.kernel.org/pub/scm/linux/kernel/git/dborkman/iproute2.git
XDP 설정도구
kernel source code 및 bpf sample code
samples/bpf
예제
kernel소스 내 sample code 분석
samples/bpf
예제(xdp_rxq_info_kern.c)
BPF 프로그램 컴파일 실습
samples/bpf
컴파일
$ mount bpffs /sys/fs/bpf -t bpf
$ bpftool prog load ./xdp_rxq_info_kern.o /sys/fs/bpf/xdp
프로그램 로드
$ ls /sys/fs/bpf/
$ ./bpftool prog list
$ ./bpftool prog dump xlated id X
jited
프로그램 확인
$ ip link set dev lo xdp pin /sys/fs/bpf/xdp
XDP프로그램 설정
$ ip link show dev lo
XDP프로그램 설정 확인
$ ip link set dev lo xdp off
$ rm /sys/fs/bpf/xdp
XDP프로그램 설정 제거
iptables vs XDP
TEST NETWORK
PC2
192.168.4.2
PC1
192.168.4.1
ICMP
$ ping
iptables를 사용하여 패킷을 버리기
DROP
#PC2
$ ping 192.168.4.1
#PC1
$ iptables -A INPUT -s 192.168.4.2 -d 192.168.4.1 -p icmp 
-j DROP
NORMAL PATH
TX
APP
L7
RX
L3 input
TC Ingress
PREROUTING ROUTING
TCP/UDP
FORWARD ROUTING
INPUT OUTPUT
POSTROUTING
TC egress
L3 output
L4
L3
L3
DD
NORMAL PATH
TX
APP
L7
RX
L3 input
TC Ingress
PREROUTING ROUTING
TCP/UDP
FORWARD ROUTING
INPUT OUTPUT
POSTROUTING
TC egress
L3 output
L4
L3
L3
DD
DROP
XDP를 사용하여 패킷을 버리기
DROP
$ ./bpftool prog load ./xdp_icmp.o /sys/fs/bpf/xdp_icmp
$ ip link set dev lo xdp pin /sys/fs/bpf/xdp_icmp
XDP프로그램 설정 제거
XDP GENERIC PATH
TX
APP
L7
RX
BPF
TC Ingress
PREROUTING ROUTING
TCP/UDP
FORWARD ROUTING
INPUT OUTPUT
POSTROUTING
TC egress
L3 output
L4
L3
L3
DD
DROP
BPF Tracing
iptables path VS XDP path
netif_receive_skb_internal()
ipt_do_table()
DROP
BPF Tracing:
iptables - DROP case
netif_receive_skb_internal()
ipt_do_table()
Long time !! ~~
DROP
BPF Tracing:
iptables - DROP case
netif_receive_skb_internal()
do_xdp_generic()
DROP
BPF Tracing:
XDP - DROP case
netif_receive_skb_internal()
do_xdp_generic()
Short time !! ~~
DROP
BPF Tracing:
XDP - DROP case
netif_receive_skb_internal()
ipt_do_table()
do_xdp_generic()
Short time !! ~~
BPF Tracing:
iptables vs XDP - DROP case
DROP
DROP
Long time !! ~~
BPF Tracing:
iptables vs XDP - DROP case
net/core/dev.c:
static int netif_receive_skb_internal(struct sk_buff *skb)
net/core/dev.c:
int do_xdp_generic(struct bpf_prog *xdp_prog, struct sk_buff *skb)
net/ipv4/netfilter/ip_tables.c:
unsigned int ipt_do_table(struct sk_buff *skb, ...)
DROP
DROP
net/core/dev.c:
static int netif_receive_skb_internal(struct sk_buff *skb)
net/core/dev.c:
int do_xdp_generic(struct bpf_prog *xdp_prog, struct sk_buff *skb)
net/ipv4/netfilter/ip_tables.c:
unsigned int ipt_do_table(struct sk_buff *skb, ...)
BPF Tracing:
iptables vs XDP - DROP case
net/core/dev.c:
static int netif_receive_skb_internal(struct sk_buff *skb)
net/core/dev.c:
int do_xdp_generic(struct bpf_prog *xdp_prog, struct sk_buff *skb)
net/ipv4/netfilter/ip_tables.c:
unsigned int ipt_do_table(struct sk_buff *skb, ...)
BPF Tracing:
iptables vs XDP - DROP case
BPF
BPF
Beginning point: BPF ATTACH !!
BPF
Return point: BPF ATTACH !!
Return point: BPF ATTACH !!
net/core/dev.c:
static int netif_receive_skb_internal(struct sk_buff *skb)
net/core/dev.c:
int do_xdp_generic(struct bpf_prog *xdp_prog, struct sk_buff *skb)
net/ipv4/netfilter/ip_tables.c:
unsigned int ipt_do_table(struct sk_buff *skb, ...)
BPF Tracing:
iptables vs XDP - DROP case
BPF
BPF
BPFSEC("kprobe/netif_receive_skb_internal")
int bpf_trace_receive_skb(struct pt_regs *ctx)
{
long skb_ptr = PT_REGS_PARM1(ctx);
u64 start_time = bpf_ktime_get_ns();
bpf_map_update_elem(&tracing_map, &skb_ptr, &start_time,
BPF_ANY);
return 0;
}
BPF
BPF
net/core/dev.c:
static int netif_receive_skb_internal(struct sk_buff *skb)
net/core/dev.c:
int do_xdp_generic(struct bpf_prog *xdp_prog, struct sk_buff *skb)
net/ipv4/netfilter/ip_tables.c:
unsigned int ipt_do_table(struct sk_buff *skb, ...)
BPF Tracing:
iptables vs XDP - DROP case
BPF
SEC("kprobe/netif_receive_skb_internal")
int bpf_trace_receive_skb(struct pt_regs *ctx)
{
long skb_ptr = PT_REGS_PARM1(ctx);
u64 start_time = bpf_ktime_get_ns();
bpf_map_update_elem(&tracing_map, &skb_ptr, &start_time,
BPF_ANY);
return 0;
}
BPF
BPF
net/core/dev.c:
static int netif_receive_skb_internal(struct sk_buff *skb)
net/core/dev.c:
int do_xdp_generic(struct bpf_prog *xdp_prog, struct sk_buff *skb)
net/ipv4/netfilter/ip_tables.c:
unsigned int ipt_do_table(struct sk_buff *skb, ...)
BPF Tracing:
iptables vs XDP - DROP case
BPF
SEC("kretprobe/do_xdp_generic")
int bpf_trace_xdp_drop(struct pt_regs *ctx)
{
long skb_ptr = PT_REGS_PARM2(ctx);
int action = PT_REGS_RC(ctx);
if (action == XDP_DROP) {
u64 *time = bpf_map_lookup_elem(&tracing_map, &skb_ptr);
u64 cur_time = bpf_ktime_get_ns();
u64 delta = cur_time - tr->time;
*time = delta;
...
BPF
BPF
net/core/dev.c:
static int netif_receive_skb_internal(struct sk_buff *skb)
net/core/dev.c:
int do_xdp_generic(struct bpf_prog *xdp_prog, struct sk_buff *skb)
net/ipv4/netfilter/ip_tables.c:
unsigned int ipt_do_table(struct sk_buff *skb, ...)
BPF Tracing:
iptables vs XDP - DROP case
BPF
SEC("kretprobe/do_xdp_generic")
int bpf_trace_xdp_drop(struct pt_regs *ctx)
{
long skb_ptr = PT_REGS_PARM2(ctx);
int action = PT_REGS_RC(ctx);
if (action == XDP_DROP) {
u64 *time = bpf_map_lookup_elem(&tracing_map, &skb_ptr);
u64 cur_time = bpf_ktime_get_ns();
u64 delta = cur_time - tr->time;
*time = delta;
...
BPF
BPF
net/core/dev.c:
static int netif_receive_skb_internal(struct sk_buff *skb)
net/core/dev.c:
int do_xdp_generic(struct bpf_prog *xdp_prog, struct sk_buff *skb)
net/ipv4/netfilter/ip_tables.c:
unsigned int ipt_do_table(struct sk_buff *skb, ...)
BPF Tracing:
iptables vs XDP - DROP case
BPF
SEC("kretprobe/ipt_do_table")
int bpf_trace_iptables_drop(struct pt_regs *ctx)
{
long skb_ptr = PT_REGS_PARM1(ctx);
int action = PT_REGS_RC(ctx);
if (action == NF_DROP) {
u64 *time = bpf_map_lookup_elem(&tracing_map, &skb_ptr);
u64 cur_time = bpf_ktime_get_ns();
u64 delta = cur_time - tr->time;
*time = delta;
...
BPF
BPF
net/core/dev.c:
static int netif_receive_skb_internal(struct sk_buff *skb)
net/core/dev.c:
int do_xdp_generic(struct bpf_prog *xdp_prog, struct sk_buff *skb)
net/ipv4/netfilter/ip_tables.c:
unsigned int ipt_do_table(struct sk_buff *skb, ...)
BPF Tracing:
iptables vs XDP - DROP case
BPF
SEC("kretprobe/ipt_do_table")
int bpf_trace_iptables_drop(struct pt_regs *ctx)
{
long skb_ptr = PT_REGS_PARM1(ctx);
int action = PT_REGS_RC(ctx);
if (action == NF_DROP) {
u64 *time = bpf_map_lookup_elem(&tracing_map, &skb_ptr);
u64 cur_time = bpf_ktime_get_ns();
u64 delta = cur_time - tr->time;
*time = delta;
...
Ftrace Tracing
iptables path VS XDP path
$ cat /sys/kernel/debug/tracing/trace
netif_receive_skb_internal() {
ktime_get_with_offset();
__netif_receive_skb() {
__netif_receive_skb_core() {
ip_rcv() {
pskb_trim_rcsum_slow();
nf_hook_slow() {
iptable_mangle_hook() {
ipt_do_table() {
__local_bh_enable_ip();
}
}
}
ip_rcv_finish() {
udp_v4_early_demux();
ip_route_input_noref() {
ip_route_input_rcu() {
ip_route_input_slow() {
fib_table_lookup();
fib_validate_source() {
__fib_validate_source() {
fib_table_lookup();
}
}
}
}
}
ip_local_deliver() {
nf_hook_slow() {
iptable_mangle_hook() {
ipt_do_table() {
__local_bh_enable_ip();
}
}
iptable_filter_hook() {
ipt_do_table() {
udp_mt();
__local_bh_enable_ip();
}
}
kfree_skb()
$ cat /sys/kernel/debug/tracing/trace
netif_receive_skb_internal() {
ktime_get_with_offset();
__netif_receive_skb() {
__netif_receive_skb_core() {
ip_rcv() {
pskb_trim_rcsum_slow();
nf_hook_slow() {
iptable_mangle_hook() {
ipt_do_table() {
__local_bh_enable_ip();
}
}
}
ip_rcv_finish() {
udp_v4_early_demux();
ip_route_input_noref() {
ip_route_input_rcu() {
ip_route_input_slow() {
fib_table_lookup();
fib_validate_source() {
__fib_validate_source() {
fib_table_lookup();
}
}
}
}
}
ip_local_deliver() {
nf_hook_slow() {
iptable_mangle_hook() {
ipt_do_table() {
__local_bh_enable_ip();
}
}
iptable_filter_hook() {
ipt_do_table() {
udp_mt();
__local_bh_enable_ip();
}
}
kfree_skb()
DROP
$ cat /sys/kernel/debug/tracing/trace
netif_receive_skb_internal() {
ktime_get_with_offset();
__netif_receive_skb() {
__netif_receive_skb_core() {
ip_rcv() {
pskb_trim_rcsum_slow();
nf_hook_slow() {
iptable_mangle_hook() {
ipt_do_table() {
__local_bh_enable_ip();
}
}
}
ip_rcv_finish() {
udp_v4_early_demux();
ip_route_input_noref() {
ip_route_input_rcu() {
ip_route_input_slow() {
fib_table_lookup();
fib_validate_source() {
__fib_validate_source() {
fib_table_lookup();
}
}
}
}
}
ip_local_deliver() {
nf_hook_slow() {
iptable_mangle_hook() {
ipt_do_table() {
__local_bh_enable_ip();
}
}
iptable_filter_hook() {
ipt_do_table() {
udp_mt();
__local_bh_enable_ip();
}
}
kfree_skb()
netif_receive_skb_internal() {
ktime_get_with_offset();
do_xdp_generic() {
pskb_expand_head() {
__kmalloc_reserve.isra.48() {
__kmalloc_node_track_caller() {
kmalloc_slab();
should_failslab();
}
}
ksize();
skb_free_head() {
page_frag_free();
}
skb_headers_offset_update();
}
__bpf_prog_run32() {
___bpf_prog_run();
}
kfree_skb()
DROP
DROP
$ cat /sys/kernel/debug/tracing/trace
netif_receive_skb_internal() {
ktime_get_with_offset();
__netif_receive_skb() {
__netif_receive_skb_core() {
ip_rcv() {
pskb_trim_rcsum_slow();
nf_hook_slow() {
iptable_mangle_hook() {
ipt_do_table() {
__local_bh_enable_ip();
}
}
}
ip_rcv_finish() {
udp_v4_early_demux();
ip_route_input_noref() {
ip_route_input_rcu() {
ip_route_input_slow() {
fib_table_lookup();
fib_validate_source() {
__fib_validate_source() {
fib_table_lookup();
}
}
}
}
}
ip_local_deliver() {
nf_hook_slow() {
iptable_mangle_hook() {
ipt_do_table() {
__local_bh_enable_ip();
}
}
iptable_filter_hook() {
ipt_do_table() {
udp_mt();
__local_bh_enable_ip();
}
}
kfree_skb()
netif_receive_skb_internal() {
ktime_get_with_offset();
do_xdp_generic() {
pskb_expand_head() {
__kmalloc_reserve.isra.48() {
__kmalloc_node_track_caller() {
kmalloc_slab();
should_failslab();
}
}
ksize();
skb_free_head() {
page_frag_free();
}
skb_headers_offset_update();
}
__bpf_prog_run32() {
___bpf_prog_run();
}
kfree_skb()
DROP
DROP
$ cat /sys/kernel/debug/tracing/trace
netif_receive_skb_internal() {
ktime_get_with_offset();
__netif_receive_skb() {
__netif_receive_skb_core() {
ip_rcv() {
pskb_trim_rcsum_slow();
nf_hook_slow() {
iptable_mangle_hook() {
ipt_do_table() {
__local_bh_enable_ip();
}
}
}
ip_rcv_finish() {
udp_v4_early_demux();
ip_route_input_noref() {
ip_route_input_rcu() {
ip_route_input_slow() {
fib_table_lookup();
fib_validate_source() {
__fib_validate_source() {
fib_table_lookup();
}
}
}
}
}
ip_local_deliver() {
nf_hook_slow() {
iptable_mangle_hook() {
ipt_do_table() {
__local_bh_enable_ip();
}
}
iptable_filter_hook() {
ipt_do_table() {
udp_mt();
__local_bh_enable_ip();
}
}
kfree_skb()
netif_receive_skb_internal() {
ktime_get_with_offset();
do_xdp_generic() {
pskb_expand_head() {
__kmalloc_reserve.isra.48() {
__kmalloc_node_track_caller() {
kmalloc_slab();
should_failslab();
}
}
ksize();
skb_free_head() {
page_frag_free();
}
skb_headers_offset_update();
}
__bpf_prog_run32() {
___bpf_prog_run();
}
kfree_skb()
DROP
DROP
YOU WIN !!
“XDP is LOVE”
BPF internals
BPF Infrastructure:
1) Hook points in-kernel callback point
2) LOAD ATTACH CALLBACK
3) Verifier / Interpreter / JIT
4) Map user-to-kernel shared memory
5) helper를 통한 커널함수호출 leveraging
6) Object pinning /sys/fs/bpf/…
...
Hook points: callback points
KERNEL SPACE
XDP: L2 device driver 지점
tc: L3 DD 직전 / 직후 지점 kprobe: 함수 Entry / Return
. . .
. . .
Hook points: callback points
KERNEL SPACE
XDP: L2 device driver 지점
tc: L3 DD 직전 / 직후 지점 kprobe: 함수 Entry / Return
. . .
. . .if (has_bpf_prog)
BPF_PROG_RUN();
->bpf_func(ctx, insni);
특정 커널 함수 안에
Hook points: callback points
KERNEL SPACE
XDP: L2 device driver 지점
kprobe: 함수 Entry / Return
. . .
. . .
BPF
BPF
BPF
BPF prog injection !!
tc: L3 DD 직전 / 직후 지점if (has_bpf_prog)
BPF_PROG_RUN();
->bpf_func(ctx, insni);
특정 커널 함수 안에
Hook points: callback points
KERNEL SPACE
XDP: L2 device driver 지점
kprobe: 함수 Entry / Return
. . .
. . .
BPF
BPF
BPF
BPF prog injection !!
tc: L3 DD 직전 / 직후 지점if (has_bpf_prog)
BPF_PROG_RUN();
->bpf_func(ctx, insni);
BPF Interpreter
또는
JIT 된 머신코드
특정 커널 함수 안에
Hook points: callback points
KERNEL SPACE
XDP: L2 device driver 지점
kprobe: 함수 Entry / Return
. . .
. . .
BPF
BPF
BPF
BPF prog injection !!
HOW ?
tc: L3 DD 직전 / 직후 지점
KERNEL SPACE
tc ip
BPF library
in-iproute2
bpf() SYSCALL
c소스 _kern.c
clang / llc 컴파일
BPF 프로그램
Or
BPF bytecode
BPF
elf
KERNEL SPACE
tc ip
BPF library
in-iproute2
bpf() SYSCALL
c소스 _kern.c
clang / llc 컴파일
BPF 프로그램
Or
BPF bytecode
BPF
elf
1. ELF parsing,
2. 1차 Relocation:
1) map fd
2) bpf to bpf call
KERNEL SPACE
tc ip
BPF library
in-iproute2
bpf() SYSCALL
c소스 _kern.c
clang / llc 컴파일
BPF 프로그램
Or
BPF bytecode
BPF
elf
1. ELF parsing,
2. 1차 Relocation:
1) map fd
2) bpf to bpf call
Map 1
(Shared memory)
KERNEL SPACE
tc ip
BPF library
in-iproute2
bpf() SYSCALL
c소스 _kern.c
clang / llc 컴파일
BPF 프로그램
Or
BPF bytecode
BPF
elf
1. ELF parsing,
2. 1차 Relocation:
1) map fd
2) bpf to bpf call
3. Loading
BPF
BPF
BPF prog injection !!
BPF_PROG_LOAD
Map 1
(Shared memory)
KERNEL SPACE
tc ip
BPF library
in-iproute2
bpf() SYSCALL
c소스 _kern.c
clang / llc 컴파일
BPF 프로그램
Or
BPF bytecode
BPF
elf
1. ELF parsing,
2. 1차 Relocation:
1) map fd
2) bpf to bpf call
3. Loading
BPF
BPF
BPF prog injection !!
BPF Controller
(User App)
BPF library: libbpf
prog/map
load, attach, control
. . .
Map 1
(Shared memory)
KERNEL SPACE
tc ip
BPF library
in-iproute2
bpf() SYSCALL
c소스 _kern.c
clang / llc 컴파일
BPF 프로그램
Or
BPF bytecode
BPF
elf
1. ELF parsing,
2. 1차 Relocation:
1) map fd
2) bpf to bpf call
3. Loading
BPF
BPF
BPF prog injection !!
BPF Controller
(User App)
BPF library: libbpf
prog/map
load, attach, control
. . .
HOW ? in bpf()
Map 1
(Shared memory)
KERNEL SPACE
tc ip
BPF library
in-iproute2
bpf() SYSCALL
c소스 _kern.c
clang / llc 컴파일
BPF 프로그램
Or
BPF bytecode
BPF
elf
1. ELF parsing,
2. 1차 Relocation:
1) map fd
2) bpf to bpf call
3. Loading
BPF Controller
(User App)
BPF library: libbpf
prog/map
load, attach, control
. . .
BPF LOAD 과정:
1. BPF prog / map alloc
2. Verifier (loop, mem access 범위)
Map 1
(Shared memory)
KERNEL SPACE
tc ip
BPF library
in-iproute2
bpf() SYSCALL
c소스 _kern.c
clang / llc 컴파일
BPF 프로그램
Or
BPF bytecode
BPF
elf
1. ELF parsing,
2. 1차 Relocation:
1) map fd
2) bpf to bpf call
3. Loading
BPF Controller
(User App)
BPF library: libbpf
prog/map
load, attach, control
. . .
BPF LOAD 과정:
1. BPF prog / map alloc
2. Verifier (loop, mem access 범위)
3. 2차 Relocation:
1) map fd → map ptr
2) helper ID → func addr
Map 1
(Shared memory)
KERNEL SPACE
tc ip
BPF library
in-iproute2
bpf() SYSCALL
c소스 _kern.c
clang / llc 컴파일
BPF 프로그램
Or
BPF bytecode
BPF
elf
1. ELF parsing,
2. 1차 Relocation:
1) map fd
2) bpf to bpf call
3. Loading
BPF Controller
(User App)
BPF library: libbpf
prog/map
load, attach, control
. . .
BPF LOAD 과정:
1. BPF prog / map alloc
2. Verifier (loop, mem access 범위)
3. 2차 Relocation:
1) map fd → map ptr
2) helper ID → func addr
4. select runtime:
1) BPF interpreter func addr
2) JIT 후 BPF func addr
return fd;
Map 1
(Shared memory)
KERNEL SPACE
tc ip
BPF library
in-iproute2
bpf() SYSCALL
c소스 _kern.c
clang / llc 컴파일
BPF 프로그램
Or
BPF bytecode
BPF
elf
1. ELF parsing,
2. 1차 Relocation:
1) map fd
2) bpf to bpf call
3. Loading
BPF Controller
(User App)
BPF library: libbpf
prog/map
load, attach, control
. . .
BPF LOAD 과정:
1. BPF prog / map alloc
2. Verifier (loop, mem access 범위)
3. 2차 Relocation:
1) map fd → map ptr
2) helper ID → func addr
4. select runtime:
1) BPF interpreter func addr
2) JIT 후 BPF func addr
if (has_bpf_prog)
BPF_PROG_RUN();
->bpf_func(ctx, insni);
return fd;
KERNEL SPACE
tc ip
BPF library
in-iproute2
bpf() SYSCALL
c소스 _kern.c
clang / llc 컴파일
BPF 프로그램
Or
BPF bytecode
BPF
elf
1. ELF parsing,
2. 1차 Relocation:
1) map fd
2) bpf to bpf call
3. Loading
BPF
BPF
BPF Controller
(User App)
BPF library: libbpf
prog/map
load, attach, control
. . .
Map 1
(Shared memory)
Map 2
(Shared memory)
.
.
.
다양한 BPF ATTACH 방식:
- sock(), send() AF_NETLINK
- bpf() syscall BPF_PROG_ATTACH
BPF_RAW_TRACEPOINT_OPEN
- kprobe event id, ioctl()
PERF_EVENT_IOC_SET_BPF
...
KERNEL SPACE
tc ip
BPF library
in-iproute2
bpf() SYSCALL
BPF
BPF
BPF Controller
(User App)
BPF library: libbpf
prog/map
load, attach, control
. . .
Map 1
(Shared memory)
Map 2
(Shared memory)
.
.
.
BPF
Callback !!
Callback !!
BPF CALLBACK !!
KERNEL SPACE
tc ip
BPF library
in-iproute2
bpf() SYSCALL
BPF
BPF
BPF Controller
(User App)
BPF library: libbpf
prog/map
load, attach, control
. . .
Map 1
(Shared memory)
Map 2
(Shared memory)
.
.
.
BPF
func(): Helper
func()
func()
func()
BPF Helper 함수를 통한 커널함수 호출 leveraging
!!
KERNEL SPACE
tc ip
BPF library
in-iproute2
bpf() SYSCALL
BPF
BPF
. . .
Map 1
(Shared memory)
Map 2
(Shared memory)
.
.
.
BPF
func(): Helper
func()
func()
func()
BPF Controller 1
(User App)
BPF library: libbpf
prog/map
load, attach, control
BPF Controller 2
(User App)
BPF map 을 통한 user to kernel memory shared
KERNEL SPACE
bpf() SYSCALL
BPF Controller 1
(User App)
ip tc
Map 1
(Shared memory)
Map 2
(Shared memory)
.
.
BPF
BPF
BPF
func(): Helper
func()
func()
func()
BPF library
in-iproute2
BPF Controller 2
(User App)
. . . . . .
BPF Architecture:
BPF library: libbpf
prog/map
load, attach, control
XDP internals
XDP_ABORT
XDP_DROP
XDP_PASS
XDP_TX
XDP_REDIRECT
XDP RETURN TYPE
XDP_REDIRECT
XDP_TX
XDP_PASS
BPF
APP
XDP_DROP
Network Device Driver
Generic XDP
vs
Driver XDP
XDP GENERIC PATH
TX
APP
L7
RX
BPF
TC Ingress
PREROUTING ROUTING
TCP/UDP
FORWARD ROUTING
INPUT OUTPUT
POSTROUTING
TC egress
L3 output
L4
L3
L3
DD
XDP GENERIC PATH
TX
APP
L7
RX
BPF
TC Ingress
PREROUTING ROUTING
TCP/UDP
FORWARD ROUTING
INPUT OUTPUT
POSTROUTING
TC egress
L3 output
L4
L3
L3
DD
DRIVER XDP PATH
REDIRECT
TX
APP
RX
L7
L4
L3
L2
PASS
BPF
DRIVER XDP PATH
REDIRECT
TX
APP
RX
L7
L4
L3
L2
PASS
BPF
Driver XDP vs Generic XDP
REDIRECT
TX
RX
PASS
BPF
REDIRECT
TX
RX
L3
BPF
PASS
XDP 자료구조와 SKB
xdp->data
HEADROOM
MAC
HEADER
IP
HEADER
TAIL/
TAILROOM
END
skb->data
xdp->data_hard_start
xdp->data_meta
xdp_frame
DATA ACCESS 허용범위
xdp->data
HEADROOM
MAC
HEADER
IP
HEADER
TAIL/
TAILROOM
END
xdp->data_meta
xdp->data_hard_start
xdp->data
HEADROOM
MAC
HEADER
IP
HEADER
TAIL/
TAILROOM
END
xdp->data_meta
xdp->data_hard_start
XDP_REDIRECT분석
XDP_REDIRECTBPF
APP
eth0 eth1 eth2 eth3
XDP_TX
REDIRECT MAP
XDP_REDIRECTBPF
APP
eth0 eth1 eth2 eth3
XDP_TX
REDIRECT MAP
XDP_REDIRECTBPF
APP
eth0 eth1 eth2 eth3
XDP_TX
REDIRECT MAP
bpf_redirect()통한 XDP_REDIRECT
bpf_redirect()에 대해
XDP_REDIRECTBPF
APP
eth0 eth1 eth2 eth3
XDP_TX
REDIRECT MAP
XDP_REDIRECTBPF
APP
eth0 eth1 eth2 eth3
XDP_TX
REDIRECT MAP
XDP_REDIRECT - bulkTX
bulkTX
REDIRECT
TX
RX
BPF
xdp_frame
xdp_frame
xdp_frame
xdp_frame
xdp_frame
xdp_frame
xdp_frame
xdp_frame
map
DEVMAP
DEVMAP
REDIRECT
TX
RX
BPF
xdp_frame
DEVMAP
redirect info
bpf_redirect_map
Key Value(Device)
0 X
1 X
2 X
xdp_frame
xdp_frame
xdp_frame
xdp_frame
xdp_frame
xdp_frame
xdp_frame
xdp_frame
xdp_frame
xdp_frame
xdp_frame
xdp_frame
xdp_frame
xdp_frame
xdp_frame
CPUMAP
CPUMAP
REDIRECT
???
RX
BPF
xdp_frame
CPUMAP
redirect info
bpf_redirect_map
Key Value(CPU)
0 X
1 X
2 X
xdp_frame
xdp_frame
xdp_frame
xdp_frame
xdp_frame
xdp_frame
xdp_frame
CPUMAP
REDIRECT
netif_receive_skb_core
RX
BPF
xdp_frame
CPUMAP
redirect info
bpf_redirect_map
Key Value(CPU)
0 X
1 X
2 X
xdp_frame
xdp_frame
xdp_frame
xdp_frame
xdp_frame
xdp_frame
xdp_frame
GENERIC_XDP의 REDIRECT
BPFILTER
● memory model switching
○ /net/core/xdp.c
● page pool
○ /net/core/page_pool
● offload
● AF_XDP && XSK(XDP SOCKET)
● helper functions
● Device Driver
Additional Topics:
● Verifier
○ CFG, DAG, register, memory check...
● Other types
○ TC, SOCKET FILTER, CGROUP
● BTF
○ ELFutils, clang -g, llc -mattr=dwarfris
● Tail call
○ bpf_prog_array 연관
Additional Topics:
● FACEBOOK’s Katran
○ L4 Load-balancing
○ https://github.com/facebookincubator/katran
● Suricata
○ IPD/IDS engine
○ https://suricata-ids.org/
● Cilium
○ https://cilium.io/
● IOvisor bcc
○ https://www.iovisor.org/
● IR Decoding
○ https://lwn.net/Articles/759188/
Additional Topics:

More Related Content

What's hot

What's hot (20)

Linux Networking Explained
Linux Networking ExplainedLinux Networking Explained
Linux Networking Explained
 
DockerCon 2017 - Cilium - Network and Application Security with BPF and XDP
DockerCon 2017 - Cilium - Network and Application Security with BPF and XDPDockerCon 2017 - Cilium - Network and Application Security with BPF and XDP
DockerCon 2017 - Cilium - Network and Application Security with BPF and XDP
 
EBPF and Linux Networking
EBPF and Linux NetworkingEBPF and Linux Networking
EBPF and Linux Networking
 
[232] 성능어디까지쥐어짜봤니 송태웅
[232] 성능어디까지쥐어짜봤니 송태웅[232] 성능어디까지쥐어짜봤니 송태웅
[232] 성능어디까지쥐어짜봤니 송태웅
 
Xdp and ebpf_maps
Xdp and ebpf_mapsXdp and ebpf_maps
Xdp and ebpf_maps
 
eBPF Basics
eBPF BasicseBPF Basics
eBPF Basics
 
USENIX Vault'19: Performance analysis in Linux storage stack with BPF
USENIX Vault'19: Performance analysis in Linux storage stack with BPFUSENIX Vault'19: Performance analysis in Linux storage stack with BPF
USENIX Vault'19: Performance analysis in Linux storage stack with BPF
 
Cilium - Container Networking with BPF & XDP
Cilium - Container Networking with BPF & XDPCilium - Container Networking with BPF & XDP
Cilium - Container Networking with BPF & XDP
 
Linux BPF Superpowers
Linux BPF SuperpowersLinux BPF Superpowers
Linux BPF Superpowers
 
BPF Internals (eBPF)
BPF Internals (eBPF)BPF Internals (eBPF)
BPF Internals (eBPF)
 
Kubernetes Networking with Cilium - Deep Dive
Kubernetes Networking with Cilium - Deep DiveKubernetes Networking with Cilium - Deep Dive
Kubernetes Networking with Cilium - Deep Dive
 
Receive side scaling (RSS) with eBPF in QEMU and virtio-net
Receive side scaling (RSS) with eBPF in QEMU and virtio-netReceive side scaling (RSS) with eBPF in QEMU and virtio-net
Receive side scaling (RSS) with eBPF in QEMU and virtio-net
 
eBPF Workshop
eBPF WorkshopeBPF Workshop
eBPF Workshop
 
Replacing iptables with eBPF in Kubernetes with Cilium
Replacing iptables with eBPF in Kubernetes with CiliumReplacing iptables with eBPF in Kubernetes with Cilium
Replacing iptables with eBPF in Kubernetes with Cilium
 
Performance Wins with eBPF: Getting Started (2021)
Performance Wins with eBPF: Getting Started (2021)Performance Wins with eBPF: Getting Started (2021)
Performance Wins with eBPF: Getting Started (2021)
 
Security Monitoring with eBPF
Security Monitoring with eBPFSecurity Monitoring with eBPF
Security Monitoring with eBPF
 
Dataplane programming with eBPF: architecture and tools
Dataplane programming with eBPF: architecture and toolsDataplane programming with eBPF: architecture and tools
Dataplane programming with eBPF: architecture and tools
 
Meet cute-between-ebpf-and-tracing
Meet cute-between-ebpf-and-tracingMeet cute-between-ebpf-and-tracing
Meet cute-between-ebpf-and-tracing
 
Faster packet processing in Linux: XDP
Faster packet processing in Linux: XDPFaster packet processing in Linux: XDP
Faster packet processing in Linux: XDP
 
Dpdk performance
Dpdk performanceDpdk performance
Dpdk performance
 

Similar to BPF / XDP 8월 세미나 KossLab

Building Network Functions with eBPF & BCC
Building Network Functions with eBPF & BCCBuilding Network Functions with eBPF & BCC
Building Network Functions with eBPF & BCC
Kernel TLV
 
Not breaking userspace: the evolving Linux ABI
Not breaking userspace: the evolving Linux ABINot breaking userspace: the evolving Linux ABI
Not breaking userspace: the evolving Linux ABI
Alison Chaiken
 

Similar to BPF / XDP 8월 세미나 KossLab (20)

Efficient System Monitoring in Cloud Native Environments
Efficient System Monitoring in Cloud Native EnvironmentsEfficient System Monitoring in Cloud Native Environments
Efficient System Monitoring in Cloud Native Environments
 
UM2019 Extended BPF: A New Type of Software
UM2019 Extended BPF: A New Type of SoftwareUM2019 Extended BPF: A New Type of Software
UM2019 Extended BPF: A New Type of Software
 
eBPF Tooling and Debugging Infrastructure
eBPF Tooling and Debugging InfrastructureeBPF Tooling and Debugging Infrastructure
eBPF Tooling and Debugging Infrastructure
 
ebpf and IO Visor: The What, how, and what next!
ebpf and IO Visor: The What, how, and what next!ebpf and IO Visor: The What, how, and what next!
ebpf and IO Visor: The What, how, and what next!
 
eBPF Debugging Infrastructure - Current Techniques
eBPF Debugging Infrastructure - Current TechniqueseBPF Debugging Infrastructure - Current Techniques
eBPF Debugging Infrastructure - Current Techniques
 
DCSF 19 eBPF Superpowers
DCSF 19 eBPF SuperpowersDCSF 19 eBPF Superpowers
DCSF 19 eBPF Superpowers
 
DISTRIBUTED PERFORMANCE ANALYSIS USING INFLUXDB AND THE LINUX EBPF VIRTUAL MA...
DISTRIBUTED PERFORMANCE ANALYSIS USING INFLUXDB AND THE LINUX EBPF VIRTUAL MA...DISTRIBUTED PERFORMANCE ANALYSIS USING INFLUXDB AND THE LINUX EBPF VIRTUAL MA...
DISTRIBUTED PERFORMANCE ANALYSIS USING INFLUXDB AND THE LINUX EBPF VIRTUAL MA...
 
Building Network Functions with eBPF & BCC
Building Network Functions with eBPF & BCCBuilding Network Functions with eBPF & BCC
Building Network Functions with eBPF & BCC
 
Beagleboard xm-setup
Beagleboard xm-setupBeagleboard xm-setup
Beagleboard xm-setup
 
Systems@Scale 2021 BPF Performance Getting Started
Systems@Scale 2021 BPF Performance Getting StartedSystems@Scale 2021 BPF Performance Getting Started
Systems@Scale 2021 BPF Performance Getting Started
 
Berkeley Packet Filters
Berkeley Packet FiltersBerkeley Packet Filters
Berkeley Packet Filters
 
BPF Hardware Offload Deep Dive
BPF Hardware Offload Deep DiveBPF Hardware Offload Deep Dive
BPF Hardware Offload Deep Dive
 
Meetup 2009
Meetup 2009Meetup 2009
Meetup 2009
 
PHP selber bauen
PHP selber bauenPHP selber bauen
PHP selber bauen
 
Debugging Python with gdb
Debugging Python with gdbDebugging Python with gdb
Debugging Python with gdb
 
eBPF - Observability In Deep
eBPF - Observability In DeepeBPF - Observability In Deep
eBPF - Observability In Deep
 
Not breaking userspace: the evolving Linux ABI
Not breaking userspace: the evolving Linux ABINot breaking userspace: the evolving Linux ABI
Not breaking userspace: the evolving Linux ABI
 
Software Quality Assurance Tooling 2023
Software Quality Assurance Tooling 2023Software Quality Assurance Tooling 2023
Software Quality Assurance Tooling 2023
 
Introduction of eBPF - 時下最夯的Linux Technology
Introduction of eBPF - 時下最夯的Linux Technology Introduction of eBPF - 時下最夯的Linux Technology
Introduction of eBPF - 時下最夯的Linux Technology
 
An Overview of the IHK/McKernel Multi-kernel Operating System
An Overview of the IHK/McKernel Multi-kernel Operating SystemAn Overview of the IHK/McKernel Multi-kernel Operating System
An Overview of the IHK/McKernel Multi-kernel Operating System
 

Recently uploaded

TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc
 
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
panagenda
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
CORS (Kitworks Team Study 양다윗 발표자료 240510)
CORS (Kitworks Team Study 양다윗 발표자료 240510)CORS (Kitworks Team Study 양다윗 발표자료 240510)
CORS (Kitworks Team Study 양다윗 발표자료 240510)
Wonjun Hwang
 

Recently uploaded (20)

Stronger Together: Developing an Organizational Strategy for Accessible Desig...
Stronger Together: Developing an Organizational Strategy for Accessible Desig...Stronger Together: Developing an Organizational Strategy for Accessible Desig...
Stronger Together: Developing an Organizational Strategy for Accessible Desig...
 
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
 
AI mind or machine power point presentation
AI mind or machine power point presentationAI mind or machine power point presentation
AI mind or machine power point presentation
 
Intro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptxIntro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptx
 
Frisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdf
Frisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdfFrisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdf
Frisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdf
 
Continuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on ThanabotsContinuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
 
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
 
JohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptx
 
Vector Search @ sw2con for slideshare.pptx
Vector Search @ sw2con for slideshare.pptxVector Search @ sw2con for slideshare.pptx
Vector Search @ sw2con for slideshare.pptx
 
The Ultimate Prompt Engineering Guide for Generative AI: Get the Most Out of ...
The Ultimate Prompt Engineering Guide for Generative AI: Get the Most Out of ...The Ultimate Prompt Engineering Guide for Generative AI: Get the Most Out of ...
The Ultimate Prompt Engineering Guide for Generative AI: Get the Most Out of ...
 
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
 
UiPath manufacturing technology benefits and AI overview
UiPath manufacturing technology benefits and AI overviewUiPath manufacturing technology benefits and AI overview
UiPath manufacturing technology benefits and AI overview
 
Top 10 CodeIgniter Development Companies
Top 10 CodeIgniter Development CompaniesTop 10 CodeIgniter Development Companies
Top 10 CodeIgniter Development Companies
 
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
 
Event-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream ProcessingEvent-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream Processing
 
Generative AI Use Cases and Applications.pdf
Generative AI Use Cases and Applications.pdfGenerative AI Use Cases and Applications.pdf
Generative AI Use Cases and Applications.pdf
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
2024 May Patch Tuesday
2024 May Patch Tuesday2024 May Patch Tuesday
2024 May Patch Tuesday
 
WebRTC and SIP not just audio and video @ OpenSIPS 2024
WebRTC and SIP not just audio and video @ OpenSIPS 2024WebRTC and SIP not just audio and video @ OpenSIPS 2024
WebRTC and SIP not just audio and video @ OpenSIPS 2024
 
CORS (Kitworks Team Study 양다윗 발표자료 240510)
CORS (Kitworks Team Study 양다윗 발표자료 240510)CORS (Kitworks Team Study 양다윗 발표자료 240510)
CORS (Kitworks Team Study 양다윗 발표자료 240510)
 

BPF / XDP 8월 세미나 KossLab

  • 1. Linux Kernel - BPF / XDP KossLab 유태희, 송태웅
  • 2. BPF 란 ? 1. 1. Berkeley Packet Filter since 1992 2. 2. Kernel Infrastructure
  • 3. BPF 란 ? 1. Berkeley Packet Filter since 1992 1. 2. Kernel Infrastructure a. - Interpreter in-kernel virtual machine - Hook points in-kernel callback point - Map - Helper
  • 4. BPF 란 ? “Safe dynamic programs and tools” "런타임중 안전하게 커널코드를 삽입하는 기술"
  • 5. BPF Infrastructure: 안전한 code injection 작전 1) Native 머신코드 대신 BPF instruction 을 활용하자 2) Verifier 를 통해 위험요소를 미리검사하자 3) (기존)커널함수가 필요할때 Helper 함수를 통해서만 호출하자
  • 6. BPF Infrastructure: 안전한 code injection 작전 1) Native 머신코드 대신 BPF instruction 을 활용하자
  • 7. BPF Infrastructure: 안전한 code injection 작전 2) Verifier 를 통해 위험요소를 미리검사하자
  • 8. BPF Infrastructure: 안전한 code injection 작전 3) (기존)커널함수가 필요할때 Helper 함수를 통해서만 호출하자
  • 9. BPF Infrastructure: 안전한 code injection 위한 기반기술 Kernel += BPF Interpreter in-kernel virtual machine + Verifier + BPF Helper 함수 추가 leveraging kernel func + BPF syscall prog/map: loading & attaching 등
  • 10. 1) 주니어 x86 Instruction set ’simplified x86’ (참고: PLUMgrind의 x86 bytecode verifier 실패) 2) BPF = classic BPF:10% + x86:70% + arm64:25% + risc:5% 3) Instruction encoding 사이즈 고정 (for high interpreter speed) 4) 간소화 -> 위험을 예측하고 예방하기 수월 (Verifier를 통한 loop, memory access 범위 점검 등) 5) Architecture-independent BPF Instruction set:
  • 11. BPF Instruction set: immediate:32 offset:16 src:4 dst:4 opcode:8 $ cat include/uapi/linux/bpf.h [...] struct bpf_insn { __u8 code; /* opcode */ __u8 dst_reg:4; /* dest register */ __u8 src_reg:4; /* source register */ __s16 off; /* signed offset */ __s32 imm; /* signed immediate constant */ }; [...]
  • 12. BPF Instruction set: immediate:32 offset:16 src:4 dst:4 opcode:8 class:4 + fields:4 + fields:4 eBPF: include/uapi/linux/bpf.h cBPF: include/uapi/linux/bpf_common.h
  • 13. BPF Instruction set: immediate:32 offset:16 src:4 dst:4 opcode:8 class:4 + LD/ST fields:4 + ALU/JUM fields:4 eBPF: include/uapi/linux/bpf.h cBPF: include/uapi/linux/bpf_common.h LD/ST 계열: 0x00 ~ 0x03 ALU/JMP 계열: 0x04 ~ 0x07
  • 14. BPF Instruction set: immediate:32 offset:16 src:4 dst:4 opcode:8 class:4 + LD/ST fields:4 + ALU/JUM fields:4 eBPF: include/uapi/linux/bpf.h cBPF: include/uapi/linux/bpf_common.h LD/ST 계열: 0x00 ~ 0x03 ALU/JMP 계열: 0x04 ~ 0x07
  • 15. BPF Instruction set: struct bpf_insn prog[] = { BPF_MOV64_REG(BPF_REG_6, BPF_REG_1), BPF_LD_ABS(BPF_B, ETH_HLEN + offsetof(struct iphdr, protocol) /* R0 = ip->proto */), BPF_STX_MEM(BPF_W, BPF_REG_10, BPF_REG_0, -4), /* *(u32 *)(fp - 4) = r0 */ BPF_MOV64_REG(BPF_REG_2, BPF_REG_10), BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -4), /* r2 = fp - 4 */ BPF_LD_MAP_FD(BPF_REG_1, map_fd), BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem), BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 2), BPF_MOV64_IMM(BPF_REG_1, 1), /* r1 = 1 */ BPF_RAW_INSN(BPF_STX | BPF_XADD | BPF_DW, BPF_REG_0, BPF_REG_1, 0, 0), /* xadd r0 += r1 */ BPF_MOV64_IMM(BPF_REG_0, 0), /* r0 = 0 */ BPF_EXIT_INSN(), }; https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git/tree/samples/bpf/sock_example.c
  • 16. BPF Helper 함수: $ grep BPF_CALL kernel/bpf/helpers.c: BPF_CALL_2(bpf_map_lookup_elem, struct bpf_map *, map, void *, key) BPF_CALL_4(bpf_map_update_elem, struct bpf_map *, map, void *, key, [...] kernel/trace/bpf_trace.c: BPF_CALL_2(bpf_override_return, struct pt_regs *, regs, unsigned long, rc) BPF_CALL_3(bpf_probe_read, void *, dst, u32, size, const void *, unsafe_ptr) BPF_CALL_3(bpf_probe_write_user, void *, unsafe_ptr, const void *, src, BPF_CALL_5(bpf_trace_printk, char *, fmt, u32, fmt_size, u64, arg1, [...] net/core/filter.c: BPF_CALL_1(bpf_skb_get_pay_offset, struct sk_buff *, skb) BPF_CALL_3(bpf_skb_get_nlattr, struct sk_buff *, skb, u32, a, u32, x) [...]
  • 17. BPF as a kernel subproject “Safe dynamic programs and tools” $ cat MAINTAINERS | grep -A 3 BPF BPF (Safe dynamic programs and tools) M: Alexei Starovoitov <ast@kernel.org> M: Daniel Borkmann <daniel@iogearbox.net> L: netdev@vger.kernel.org [...]
  • 18. “Safe dynamic programs and tools” $ cat MAINTAINERS | grep -A 27 BPF BPF (Safe dynamic programs and tools) [...] F: arch/x86/net/bpf_jit* [...] F: kernel/bpf/ F: kernel/trace/bpf_trace.c [...] F: net/core/filter.c F: net/sched/act_bpf.c F: net/sched/cls_bpf.c [...] [...] F: samples/bpf/ F: tools/bpf/ F: tools/lib/bpf/ F: tools/testing/selftests/bpf/ BPF as a kernel subproject
  • 19. $ cat MAINTAINERS | grep -A 27 BPF BPF (Safe dynamic programs and tools) [...] F: arch/x86/net/bpf_jit* [...] F: kernel/bpf/ F: kernel/trace/bpf_trace.c [...] F: net/core/filter.c F: net/sched/act_bpf.c F: net/sched/cls_bpf.c [...] [...] F: samples/bpf/ F: tools/bpf/ F: tools/lib/bpf/ F: tools/testing/selftests/bpf/ JIT 지원 arch: x86, arm, arm64 sparc, s390, powerpc, mips “Safe dynamic programs and tools” BPF as a kernel subproject
  • 20. “Safe dynamic programs and tools” $ cat MAINTAINERS | grep -A 27 BPF BPF (Safe dynamic programs and tools) [...] F: arch/x86/net/bpf_jit* [...] F: kernel/bpf/ F: kernel/trace/bpf_trace.c [...] F: net/core/filter.c F: net/sched/act_bpf.c F: net/sched/cls_bpf.c [...] [...] F: samples/bpf/ F: tools/bpf/ F: tools/lib/bpf/ F: tools/testing/selftests/bpf/ BPF core: Syscall, Interpreter, Verifier, Generic Helpers, Maps, ... BPF as a kernel subproject
  • 21. “Safe dynamic programs and tools” $ cat MAINTAINERS | grep -A 27 BPF BPF (Safe dynamic programs and tools) [...] F: arch/x86/net/bpf_jit* [...] F: kernel/bpf/ F: kernel/trace/bpf_trace.c [...] F: net/core/filter.c F: net/sched/act_bpf.c F: net/sched/cls_bpf.c [...] [...] F: samples/bpf/ F: tools/bpf/ F: tools/lib/bpf/ F: tools/testing/selftests/bpf/ Hook points, Specific Helpers ... For cBPF, ... BPF as a kernel subproject
  • 22. “Safe dynamic programs and tools” $ cat MAINTAINERS | grep -A 27 BPF BPF (Safe dynamic programs and tools) [...] F: arch/x86/net/bpf_jit* [...] F: kernel/bpf/ F: kernel/trace/bpf_trace.c [...] F: net/core/filter.c F: net/sched/act_bpf.c F: net/sched/cls_bpf.c [...] [...] F: samples/bpf/ F: tools/bpf/ F: tools/lib/bpf/ F: tools/testing/selftests/bpf/ bpf loading(lib), bpf tool, test codes, samples, ... BPF as a kernel subproject
  • 23. BPF Infrastructure: BPF프로그램 활용을 위한 지원 1) Hook points in-kernel callback point 2) Map user-to-kernel shared memory 3) helper를 통한 커널함수호출 leveraging 4) Object pinning /sys/fs/bpf/...
  • 24. KERNEL SPACE bpf() SYSCALL BPF Controller 1 (User App) ip tc Map 1 (Shared memory) Map 2 (Shared memory) . . BPF BPF BPF func(): Helper func() func() func() BPF library in-iproute2 BPF Controller 2 (User App) . . . . . . BPF Architecture: BPF library: libbpf prog/map load, attach, control
  • 25. XDP
  • 30. XDP == FAST PATH
  • 31. NORMAL PATH TX APP L7 RX L3 input TC Ingress PREROUTING ROUTING TCP/UDP FORWARD ROUTING INPUT OUTPUT POSTROUTING TC egress L3 output L4 L3 L3 DD
  • 32. NORMAL PATH TX APP L7 RX L3 input TC Ingress PREROUTING ROUTING TCP/UDP FORWARD ROUTING INPUT OUTPUT POSTROUTING TC egress L3 output L4 L3 L3 DD
  • 35. 준비물 1. 컴파일 컴퓨터 1대 2. 테스트 컴퓨터 1대(x86추천) 3. 커널 소스코드 4. clang + llvm(컴파일러) 5. bpftool(bpf 프로그램 로더) 6. bpf를 지원하는 iproute2 패키지
  • 37. git.kernel.org 의 bpf tree https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git 커널 소스코드
  • 40. kernel source code 및 bpf sample code samples/bpf 예제
  • 41. kernel소스 내 sample code 분석 samples/bpf 예제(xdp_rxq_info_kern.c)
  • 42. BPF 프로그램 컴파일 실습 samples/bpf 컴파일
  • 43. $ mount bpffs /sys/fs/bpf -t bpf $ bpftool prog load ./xdp_rxq_info_kern.o /sys/fs/bpf/xdp 프로그램 로드
  • 44. $ ls /sys/fs/bpf/ $ ./bpftool prog list $ ./bpftool prog dump xlated id X jited 프로그램 확인
  • 45. $ ip link set dev lo xdp pin /sys/fs/bpf/xdp XDP프로그램 설정
  • 46. $ ip link show dev lo XDP프로그램 설정 확인
  • 47. $ ip link set dev lo xdp off $ rm /sys/fs/bpf/xdp XDP프로그램 설정 제거
  • 51. #PC2 $ ping 192.168.4.1 #PC1 $ iptables -A INPUT -s 192.168.4.2 -d 192.168.4.1 -p icmp -j DROP
  • 52. NORMAL PATH TX APP L7 RX L3 input TC Ingress PREROUTING ROUTING TCP/UDP FORWARD ROUTING INPUT OUTPUT POSTROUTING TC egress L3 output L4 L3 L3 DD
  • 53. NORMAL PATH TX APP L7 RX L3 input TC Ingress PREROUTING ROUTING TCP/UDP FORWARD ROUTING INPUT OUTPUT POSTROUTING TC egress L3 output L4 L3 L3 DD DROP
  • 55. $ ./bpftool prog load ./xdp_icmp.o /sys/fs/bpf/xdp_icmp $ ip link set dev lo xdp pin /sys/fs/bpf/xdp_icmp XDP프로그램 설정 제거
  • 56. XDP GENERIC PATH TX APP L7 RX BPF TC Ingress PREROUTING ROUTING TCP/UDP FORWARD ROUTING INPUT OUTPUT POSTROUTING TC egress L3 output L4 L3 L3 DD DROP
  • 59. netif_receive_skb_internal() ipt_do_table() Long time !! ~~ DROP BPF Tracing: iptables - DROP case
  • 61. netif_receive_skb_internal() do_xdp_generic() Short time !! ~~ DROP BPF Tracing: XDP - DROP case
  • 62. netif_receive_skb_internal() ipt_do_table() do_xdp_generic() Short time !! ~~ BPF Tracing: iptables vs XDP - DROP case DROP DROP Long time !! ~~
  • 63. BPF Tracing: iptables vs XDP - DROP case net/core/dev.c: static int netif_receive_skb_internal(struct sk_buff *skb) net/core/dev.c: int do_xdp_generic(struct bpf_prog *xdp_prog, struct sk_buff *skb) net/ipv4/netfilter/ip_tables.c: unsigned int ipt_do_table(struct sk_buff *skb, ...) DROP DROP
  • 64. net/core/dev.c: static int netif_receive_skb_internal(struct sk_buff *skb) net/core/dev.c: int do_xdp_generic(struct bpf_prog *xdp_prog, struct sk_buff *skb) net/ipv4/netfilter/ip_tables.c: unsigned int ipt_do_table(struct sk_buff *skb, ...) BPF Tracing: iptables vs XDP - DROP case
  • 65. net/core/dev.c: static int netif_receive_skb_internal(struct sk_buff *skb) net/core/dev.c: int do_xdp_generic(struct bpf_prog *xdp_prog, struct sk_buff *skb) net/ipv4/netfilter/ip_tables.c: unsigned int ipt_do_table(struct sk_buff *skb, ...) BPF Tracing: iptables vs XDP - DROP case BPF BPF Beginning point: BPF ATTACH !! BPF Return point: BPF ATTACH !! Return point: BPF ATTACH !!
  • 66. net/core/dev.c: static int netif_receive_skb_internal(struct sk_buff *skb) net/core/dev.c: int do_xdp_generic(struct bpf_prog *xdp_prog, struct sk_buff *skb) net/ipv4/netfilter/ip_tables.c: unsigned int ipt_do_table(struct sk_buff *skb, ...) BPF Tracing: iptables vs XDP - DROP case BPF BPF BPFSEC("kprobe/netif_receive_skb_internal") int bpf_trace_receive_skb(struct pt_regs *ctx) { long skb_ptr = PT_REGS_PARM1(ctx); u64 start_time = bpf_ktime_get_ns(); bpf_map_update_elem(&tracing_map, &skb_ptr, &start_time, BPF_ANY); return 0; }
  • 67. BPF BPF net/core/dev.c: static int netif_receive_skb_internal(struct sk_buff *skb) net/core/dev.c: int do_xdp_generic(struct bpf_prog *xdp_prog, struct sk_buff *skb) net/ipv4/netfilter/ip_tables.c: unsigned int ipt_do_table(struct sk_buff *skb, ...) BPF Tracing: iptables vs XDP - DROP case BPF SEC("kprobe/netif_receive_skb_internal") int bpf_trace_receive_skb(struct pt_regs *ctx) { long skb_ptr = PT_REGS_PARM1(ctx); u64 start_time = bpf_ktime_get_ns(); bpf_map_update_elem(&tracing_map, &skb_ptr, &start_time, BPF_ANY); return 0; }
  • 68. BPF BPF net/core/dev.c: static int netif_receive_skb_internal(struct sk_buff *skb) net/core/dev.c: int do_xdp_generic(struct bpf_prog *xdp_prog, struct sk_buff *skb) net/ipv4/netfilter/ip_tables.c: unsigned int ipt_do_table(struct sk_buff *skb, ...) BPF Tracing: iptables vs XDP - DROP case BPF SEC("kretprobe/do_xdp_generic") int bpf_trace_xdp_drop(struct pt_regs *ctx) { long skb_ptr = PT_REGS_PARM2(ctx); int action = PT_REGS_RC(ctx); if (action == XDP_DROP) { u64 *time = bpf_map_lookup_elem(&tracing_map, &skb_ptr); u64 cur_time = bpf_ktime_get_ns(); u64 delta = cur_time - tr->time; *time = delta; ...
  • 69. BPF BPF net/core/dev.c: static int netif_receive_skb_internal(struct sk_buff *skb) net/core/dev.c: int do_xdp_generic(struct bpf_prog *xdp_prog, struct sk_buff *skb) net/ipv4/netfilter/ip_tables.c: unsigned int ipt_do_table(struct sk_buff *skb, ...) BPF Tracing: iptables vs XDP - DROP case BPF SEC("kretprobe/do_xdp_generic") int bpf_trace_xdp_drop(struct pt_regs *ctx) { long skb_ptr = PT_REGS_PARM2(ctx); int action = PT_REGS_RC(ctx); if (action == XDP_DROP) { u64 *time = bpf_map_lookup_elem(&tracing_map, &skb_ptr); u64 cur_time = bpf_ktime_get_ns(); u64 delta = cur_time - tr->time; *time = delta; ...
  • 70. BPF BPF net/core/dev.c: static int netif_receive_skb_internal(struct sk_buff *skb) net/core/dev.c: int do_xdp_generic(struct bpf_prog *xdp_prog, struct sk_buff *skb) net/ipv4/netfilter/ip_tables.c: unsigned int ipt_do_table(struct sk_buff *skb, ...) BPF Tracing: iptables vs XDP - DROP case BPF SEC("kretprobe/ipt_do_table") int bpf_trace_iptables_drop(struct pt_regs *ctx) { long skb_ptr = PT_REGS_PARM1(ctx); int action = PT_REGS_RC(ctx); if (action == NF_DROP) { u64 *time = bpf_map_lookup_elem(&tracing_map, &skb_ptr); u64 cur_time = bpf_ktime_get_ns(); u64 delta = cur_time - tr->time; *time = delta; ...
  • 71. BPF BPF net/core/dev.c: static int netif_receive_skb_internal(struct sk_buff *skb) net/core/dev.c: int do_xdp_generic(struct bpf_prog *xdp_prog, struct sk_buff *skb) net/ipv4/netfilter/ip_tables.c: unsigned int ipt_do_table(struct sk_buff *skb, ...) BPF Tracing: iptables vs XDP - DROP case BPF SEC("kretprobe/ipt_do_table") int bpf_trace_iptables_drop(struct pt_regs *ctx) { long skb_ptr = PT_REGS_PARM1(ctx); int action = PT_REGS_RC(ctx); if (action == NF_DROP) { u64 *time = bpf_map_lookup_elem(&tracing_map, &skb_ptr); u64 cur_time = bpf_ktime_get_ns(); u64 delta = cur_time - tr->time; *time = delta; ...
  • 73. $ cat /sys/kernel/debug/tracing/trace netif_receive_skb_internal() { ktime_get_with_offset(); __netif_receive_skb() { __netif_receive_skb_core() { ip_rcv() { pskb_trim_rcsum_slow(); nf_hook_slow() { iptable_mangle_hook() { ipt_do_table() { __local_bh_enable_ip(); } } } ip_rcv_finish() { udp_v4_early_demux(); ip_route_input_noref() { ip_route_input_rcu() { ip_route_input_slow() { fib_table_lookup(); fib_validate_source() { __fib_validate_source() { fib_table_lookup(); } } } } } ip_local_deliver() { nf_hook_slow() { iptable_mangle_hook() { ipt_do_table() { __local_bh_enable_ip(); } } iptable_filter_hook() { ipt_do_table() { udp_mt(); __local_bh_enable_ip(); } } kfree_skb()
  • 74. $ cat /sys/kernel/debug/tracing/trace netif_receive_skb_internal() { ktime_get_with_offset(); __netif_receive_skb() { __netif_receive_skb_core() { ip_rcv() { pskb_trim_rcsum_slow(); nf_hook_slow() { iptable_mangle_hook() { ipt_do_table() { __local_bh_enable_ip(); } } } ip_rcv_finish() { udp_v4_early_demux(); ip_route_input_noref() { ip_route_input_rcu() { ip_route_input_slow() { fib_table_lookup(); fib_validate_source() { __fib_validate_source() { fib_table_lookup(); } } } } } ip_local_deliver() { nf_hook_slow() { iptable_mangle_hook() { ipt_do_table() { __local_bh_enable_ip(); } } iptable_filter_hook() { ipt_do_table() { udp_mt(); __local_bh_enable_ip(); } } kfree_skb() DROP
  • 75. $ cat /sys/kernel/debug/tracing/trace netif_receive_skb_internal() { ktime_get_with_offset(); __netif_receive_skb() { __netif_receive_skb_core() { ip_rcv() { pskb_trim_rcsum_slow(); nf_hook_slow() { iptable_mangle_hook() { ipt_do_table() { __local_bh_enable_ip(); } } } ip_rcv_finish() { udp_v4_early_demux(); ip_route_input_noref() { ip_route_input_rcu() { ip_route_input_slow() { fib_table_lookup(); fib_validate_source() { __fib_validate_source() { fib_table_lookup(); } } } } } ip_local_deliver() { nf_hook_slow() { iptable_mangle_hook() { ipt_do_table() { __local_bh_enable_ip(); } } iptable_filter_hook() { ipt_do_table() { udp_mt(); __local_bh_enable_ip(); } } kfree_skb() netif_receive_skb_internal() { ktime_get_with_offset(); do_xdp_generic() { pskb_expand_head() { __kmalloc_reserve.isra.48() { __kmalloc_node_track_caller() { kmalloc_slab(); should_failslab(); } } ksize(); skb_free_head() { page_frag_free(); } skb_headers_offset_update(); } __bpf_prog_run32() { ___bpf_prog_run(); } kfree_skb() DROP DROP
  • 76. $ cat /sys/kernel/debug/tracing/trace netif_receive_skb_internal() { ktime_get_with_offset(); __netif_receive_skb() { __netif_receive_skb_core() { ip_rcv() { pskb_trim_rcsum_slow(); nf_hook_slow() { iptable_mangle_hook() { ipt_do_table() { __local_bh_enable_ip(); } } } ip_rcv_finish() { udp_v4_early_demux(); ip_route_input_noref() { ip_route_input_rcu() { ip_route_input_slow() { fib_table_lookup(); fib_validate_source() { __fib_validate_source() { fib_table_lookup(); } } } } } ip_local_deliver() { nf_hook_slow() { iptable_mangle_hook() { ipt_do_table() { __local_bh_enable_ip(); } } iptable_filter_hook() { ipt_do_table() { udp_mt(); __local_bh_enable_ip(); } } kfree_skb() netif_receive_skb_internal() { ktime_get_with_offset(); do_xdp_generic() { pskb_expand_head() { __kmalloc_reserve.isra.48() { __kmalloc_node_track_caller() { kmalloc_slab(); should_failslab(); } } ksize(); skb_free_head() { page_frag_free(); } skb_headers_offset_update(); } __bpf_prog_run32() { ___bpf_prog_run(); } kfree_skb() DROP DROP
  • 77. $ cat /sys/kernel/debug/tracing/trace netif_receive_skb_internal() { ktime_get_with_offset(); __netif_receive_skb() { __netif_receive_skb_core() { ip_rcv() { pskb_trim_rcsum_slow(); nf_hook_slow() { iptable_mangle_hook() { ipt_do_table() { __local_bh_enable_ip(); } } } ip_rcv_finish() { udp_v4_early_demux(); ip_route_input_noref() { ip_route_input_rcu() { ip_route_input_slow() { fib_table_lookup(); fib_validate_source() { __fib_validate_source() { fib_table_lookup(); } } } } } ip_local_deliver() { nf_hook_slow() { iptable_mangle_hook() { ipt_do_table() { __local_bh_enable_ip(); } } iptable_filter_hook() { ipt_do_table() { udp_mt(); __local_bh_enable_ip(); } } kfree_skb() netif_receive_skb_internal() { ktime_get_with_offset(); do_xdp_generic() { pskb_expand_head() { __kmalloc_reserve.isra.48() { __kmalloc_node_track_caller() { kmalloc_slab(); should_failslab(); } } ksize(); skb_free_head() { page_frag_free(); } skb_headers_offset_update(); } __bpf_prog_run32() { ___bpf_prog_run(); } kfree_skb() DROP DROP YOU WIN !! “XDP is LOVE”
  • 79. BPF Infrastructure: 1) Hook points in-kernel callback point 2) LOAD ATTACH CALLBACK 3) Verifier / Interpreter / JIT 4) Map user-to-kernel shared memory 5) helper를 통한 커널함수호출 leveraging 6) Object pinning /sys/fs/bpf/… ...
  • 80. Hook points: callback points KERNEL SPACE XDP: L2 device driver 지점 tc: L3 DD 직전 / 직후 지점 kprobe: 함수 Entry / Return . . . . . .
  • 81. Hook points: callback points KERNEL SPACE XDP: L2 device driver 지점 tc: L3 DD 직전 / 직후 지점 kprobe: 함수 Entry / Return . . . . . .if (has_bpf_prog) BPF_PROG_RUN(); ->bpf_func(ctx, insni); 특정 커널 함수 안에
  • 82. Hook points: callback points KERNEL SPACE XDP: L2 device driver 지점 kprobe: 함수 Entry / Return . . . . . . BPF BPF BPF BPF prog injection !! tc: L3 DD 직전 / 직후 지점if (has_bpf_prog) BPF_PROG_RUN(); ->bpf_func(ctx, insni); 특정 커널 함수 안에
  • 83. Hook points: callback points KERNEL SPACE XDP: L2 device driver 지점 kprobe: 함수 Entry / Return . . . . . . BPF BPF BPF BPF prog injection !! tc: L3 DD 직전 / 직후 지점if (has_bpf_prog) BPF_PROG_RUN(); ->bpf_func(ctx, insni); BPF Interpreter 또는 JIT 된 머신코드 특정 커널 함수 안에
  • 84. Hook points: callback points KERNEL SPACE XDP: L2 device driver 지점 kprobe: 함수 Entry / Return . . . . . . BPF BPF BPF BPF prog injection !! HOW ? tc: L3 DD 직전 / 직후 지점
  • 85. KERNEL SPACE tc ip BPF library in-iproute2 bpf() SYSCALL c소스 _kern.c clang / llc 컴파일 BPF 프로그램 Or BPF bytecode BPF elf
  • 86. KERNEL SPACE tc ip BPF library in-iproute2 bpf() SYSCALL c소스 _kern.c clang / llc 컴파일 BPF 프로그램 Or BPF bytecode BPF elf 1. ELF parsing, 2. 1차 Relocation: 1) map fd 2) bpf to bpf call
  • 87. KERNEL SPACE tc ip BPF library in-iproute2 bpf() SYSCALL c소스 _kern.c clang / llc 컴파일 BPF 프로그램 Or BPF bytecode BPF elf 1. ELF parsing, 2. 1차 Relocation: 1) map fd 2) bpf to bpf call Map 1 (Shared memory)
  • 88. KERNEL SPACE tc ip BPF library in-iproute2 bpf() SYSCALL c소스 _kern.c clang / llc 컴파일 BPF 프로그램 Or BPF bytecode BPF elf 1. ELF parsing, 2. 1차 Relocation: 1) map fd 2) bpf to bpf call 3. Loading BPF BPF BPF prog injection !! BPF_PROG_LOAD Map 1 (Shared memory)
  • 89. KERNEL SPACE tc ip BPF library in-iproute2 bpf() SYSCALL c소스 _kern.c clang / llc 컴파일 BPF 프로그램 Or BPF bytecode BPF elf 1. ELF parsing, 2. 1차 Relocation: 1) map fd 2) bpf to bpf call 3. Loading BPF BPF BPF prog injection !! BPF Controller (User App) BPF library: libbpf prog/map load, attach, control . . . Map 1 (Shared memory)
  • 90. KERNEL SPACE tc ip BPF library in-iproute2 bpf() SYSCALL c소스 _kern.c clang / llc 컴파일 BPF 프로그램 Or BPF bytecode BPF elf 1. ELF parsing, 2. 1차 Relocation: 1) map fd 2) bpf to bpf call 3. Loading BPF BPF BPF prog injection !! BPF Controller (User App) BPF library: libbpf prog/map load, attach, control . . . HOW ? in bpf() Map 1 (Shared memory)
  • 91. KERNEL SPACE tc ip BPF library in-iproute2 bpf() SYSCALL c소스 _kern.c clang / llc 컴파일 BPF 프로그램 Or BPF bytecode BPF elf 1. ELF parsing, 2. 1차 Relocation: 1) map fd 2) bpf to bpf call 3. Loading BPF Controller (User App) BPF library: libbpf prog/map load, attach, control . . . BPF LOAD 과정: 1. BPF prog / map alloc 2. Verifier (loop, mem access 범위) Map 1 (Shared memory)
  • 92. KERNEL SPACE tc ip BPF library in-iproute2 bpf() SYSCALL c소스 _kern.c clang / llc 컴파일 BPF 프로그램 Or BPF bytecode BPF elf 1. ELF parsing, 2. 1차 Relocation: 1) map fd 2) bpf to bpf call 3. Loading BPF Controller (User App) BPF library: libbpf prog/map load, attach, control . . . BPF LOAD 과정: 1. BPF prog / map alloc 2. Verifier (loop, mem access 범위) 3. 2차 Relocation: 1) map fd → map ptr 2) helper ID → func addr Map 1 (Shared memory)
  • 93. KERNEL SPACE tc ip BPF library in-iproute2 bpf() SYSCALL c소스 _kern.c clang / llc 컴파일 BPF 프로그램 Or BPF bytecode BPF elf 1. ELF parsing, 2. 1차 Relocation: 1) map fd 2) bpf to bpf call 3. Loading BPF Controller (User App) BPF library: libbpf prog/map load, attach, control . . . BPF LOAD 과정: 1. BPF prog / map alloc 2. Verifier (loop, mem access 범위) 3. 2차 Relocation: 1) map fd → map ptr 2) helper ID → func addr 4. select runtime: 1) BPF interpreter func addr 2) JIT 후 BPF func addr return fd; Map 1 (Shared memory)
  • 94. KERNEL SPACE tc ip BPF library in-iproute2 bpf() SYSCALL c소스 _kern.c clang / llc 컴파일 BPF 프로그램 Or BPF bytecode BPF elf 1. ELF parsing, 2. 1차 Relocation: 1) map fd 2) bpf to bpf call 3. Loading BPF Controller (User App) BPF library: libbpf prog/map load, attach, control . . . BPF LOAD 과정: 1. BPF prog / map alloc 2. Verifier (loop, mem access 범위) 3. 2차 Relocation: 1) map fd → map ptr 2) helper ID → func addr 4. select runtime: 1) BPF interpreter func addr 2) JIT 후 BPF func addr if (has_bpf_prog) BPF_PROG_RUN(); ->bpf_func(ctx, insni); return fd;
  • 95. KERNEL SPACE tc ip BPF library in-iproute2 bpf() SYSCALL c소스 _kern.c clang / llc 컴파일 BPF 프로그램 Or BPF bytecode BPF elf 1. ELF parsing, 2. 1차 Relocation: 1) map fd 2) bpf to bpf call 3. Loading BPF BPF BPF Controller (User App) BPF library: libbpf prog/map load, attach, control . . . Map 1 (Shared memory) Map 2 (Shared memory) . . . 다양한 BPF ATTACH 방식: - sock(), send() AF_NETLINK - bpf() syscall BPF_PROG_ATTACH BPF_RAW_TRACEPOINT_OPEN - kprobe event id, ioctl() PERF_EVENT_IOC_SET_BPF ...
  • 96. KERNEL SPACE tc ip BPF library in-iproute2 bpf() SYSCALL BPF BPF BPF Controller (User App) BPF library: libbpf prog/map load, attach, control . . . Map 1 (Shared memory) Map 2 (Shared memory) . . . BPF Callback !! Callback !! BPF CALLBACK !!
  • 97. KERNEL SPACE tc ip BPF library in-iproute2 bpf() SYSCALL BPF BPF BPF Controller (User App) BPF library: libbpf prog/map load, attach, control . . . Map 1 (Shared memory) Map 2 (Shared memory) . . . BPF func(): Helper func() func() func() BPF Helper 함수를 통한 커널함수 호출 leveraging !!
  • 98. KERNEL SPACE tc ip BPF library in-iproute2 bpf() SYSCALL BPF BPF . . . Map 1 (Shared memory) Map 2 (Shared memory) . . . BPF func(): Helper func() func() func() BPF Controller 1 (User App) BPF library: libbpf prog/map load, attach, control BPF Controller 2 (User App) BPF map 을 통한 user to kernel memory shared
  • 99. KERNEL SPACE bpf() SYSCALL BPF Controller 1 (User App) ip tc Map 1 (Shared memory) Map 2 (Shared memory) . . BPF BPF BPF func(): Helper func() func() func() BPF library in-iproute2 BPF Controller 2 (User App) . . . . . . BPF Architecture: BPF library: libbpf prog/map load, attach, control
  • 104. XDP GENERIC PATH TX APP L7 RX BPF TC Ingress PREROUTING ROUTING TCP/UDP FORWARD ROUTING INPUT OUTPUT POSTROUTING TC egress L3 output L4 L3 L3 DD
  • 105. XDP GENERIC PATH TX APP L7 RX BPF TC Ingress PREROUTING ROUTING TCP/UDP FORWARD ROUTING INPUT OUTPUT POSTROUTING TC egress L3 output L4 L3 L3 DD
  • 108. Driver XDP vs Generic XDP REDIRECT TX RX PASS BPF REDIRECT TX RX L3 BPF PASS
  • 110.
  • 116. XDP_REDIRECTBPF APP eth0 eth1 eth2 eth3 XDP_TX REDIRECT MAP
  • 117. XDP_REDIRECTBPF APP eth0 eth1 eth2 eth3 XDP_TX REDIRECT MAP
  • 118. XDP_REDIRECTBPF APP eth0 eth1 eth2 eth3 XDP_TX REDIRECT MAP
  • 120.
  • 122.
  • 123.
  • 124. XDP_REDIRECTBPF APP eth0 eth1 eth2 eth3 XDP_TX REDIRECT MAP
  • 125. XDP_REDIRECTBPF APP eth0 eth1 eth2 eth3 XDP_TX REDIRECT MAP
  • 128. DEVMAP
  • 129. DEVMAP REDIRECT TX RX BPF xdp_frame DEVMAP redirect info bpf_redirect_map Key Value(Device) 0 X 1 X 2 X xdp_frame xdp_frame xdp_frame xdp_frame xdp_frame xdp_frame xdp_frame xdp_frame xdp_frame xdp_frame xdp_frame xdp_frame xdp_frame xdp_frame xdp_frame
  • 130. CPUMAP
  • 131. CPUMAP REDIRECT ??? RX BPF xdp_frame CPUMAP redirect info bpf_redirect_map Key Value(CPU) 0 X 1 X 2 X xdp_frame xdp_frame xdp_frame xdp_frame xdp_frame xdp_frame xdp_frame
  • 132. CPUMAP REDIRECT netif_receive_skb_core RX BPF xdp_frame CPUMAP redirect info bpf_redirect_map Key Value(CPU) 0 X 1 X 2 X xdp_frame xdp_frame xdp_frame xdp_frame xdp_frame xdp_frame xdp_frame
  • 135. ● memory model switching ○ /net/core/xdp.c ● page pool ○ /net/core/page_pool ● offload ● AF_XDP && XSK(XDP SOCKET) ● helper functions ● Device Driver Additional Topics:
  • 136. ● Verifier ○ CFG, DAG, register, memory check... ● Other types ○ TC, SOCKET FILTER, CGROUP ● BTF ○ ELFutils, clang -g, llc -mattr=dwarfris ● Tail call ○ bpf_prog_array 연관 Additional Topics:
  • 137. ● FACEBOOK’s Katran ○ L4 Load-balancing ○ https://github.com/facebookincubator/katran ● Suricata ○ IPD/IDS engine ○ https://suricata-ids.org/ ● Cilium ○ https://cilium.io/ ● IOvisor bcc ○ https://www.iovisor.org/ ● IR Decoding ○ https://lwn.net/Articles/759188/ Additional Topics: