SlideShare a Scribd company logo
Linux Kernel - BPF / XDP
KossLab 유태희, 송태웅
BPF 란 ?
1. 1. Berkeley Packet Filter since 1992
2. 2. Kernel Infrastructure
BPF 란 ?
1. Berkeley Packet Filter since 1992
1. 2. Kernel Infrastructure
a. - Interpreter in-kernel virtual machine
- Hook points in-kernel callback point
- Map
- Helper
BPF 란 ?
“Safe dynamic programs and tools”
"런타임중 안전하게 커널코드를 삽입하는 기술"
BPF Infrastructure:
안전한 code injection 작전
1) Native 머신코드 대신 BPF instruction 을 활용하자
2) Verifier 를 통해 위험요소를 미리검사하자
3) (기존)커널함수가 필요할때 Helper 함수를 통해서만 호출하자
BPF Infrastructure:
안전한 code injection 작전
1) Native 머신코드 대신 BPF instruction 을 활용하자
BPF Infrastructure:
안전한 code injection 작전
2) Verifier 를 통해 위험요소를 미리검사하자
BPF Infrastructure:
안전한 code injection 작전
3) (기존)커널함수가 필요할때 Helper 함수를 통해서만 호출하자
BPF Infrastructure:
안전한 code injection 위한 기반기술
Kernel += BPF Interpreter in-kernel virtual machine
+ Verifier
+ BPF Helper 함수 추가 leveraging kernel func
+ BPF syscall prog/map: loading & attaching 등
1) 주니어 x86 Instruction set ’simplified x86’
(참고: PLUMgrind의 x86 bytecode verifier 실패)
2) BPF = classic BPF:10% + x86:70% + arm64:25% + risc:5%
3) Instruction encoding 사이즈 고정
(for high interpreter speed)
4) 간소화 -> 위험을 예측하고 예방하기 수월
(Verifier를 통한 loop, memory access 범위 점검 등)
5) Architecture-independent
BPF Instruction set:
BPF Instruction set:
immediate:32 offset:16 src:4 dst:4 opcode:8
$ cat include/uapi/linux/bpf.h
[...]
struct bpf_insn {
__u8 code; /* opcode */
__u8 dst_reg:4; /* dest register */
__u8 src_reg:4; /* source register */
__s16 off; /* signed offset */
__s32 imm; /* signed immediate constant */
};
[...]
BPF Instruction set:
immediate:32 offset:16 src:4 dst:4 opcode:8
class:4 + fields:4
+ fields:4
eBPF: include/uapi/linux/bpf.h
cBPF: include/uapi/linux/bpf_common.h
BPF Instruction set:
immediate:32 offset:16 src:4 dst:4 opcode:8
class:4 + LD/ST fields:4
+ ALU/JUM fields:4
eBPF: include/uapi/linux/bpf.h
cBPF: include/uapi/linux/bpf_common.h
LD/ST 계열:
0x00 ~ 0x03
ALU/JMP 계열:
0x04 ~ 0x07
BPF Instruction set:
immediate:32 offset:16 src:4 dst:4 opcode:8
class:4 + LD/ST fields:4
+ ALU/JUM fields:4
eBPF: include/uapi/linux/bpf.h
cBPF: include/uapi/linux/bpf_common.h
LD/ST 계열:
0x00 ~ 0x03
ALU/JMP 계열:
0x04 ~ 0x07
BPF Instruction set:
struct bpf_insn prog[] = {
BPF_MOV64_REG(BPF_REG_6, BPF_REG_1),
BPF_LD_ABS(BPF_B, ETH_HLEN + offsetof(struct iphdr, protocol) /* R0 = ip->proto */),
BPF_STX_MEM(BPF_W, BPF_REG_10, BPF_REG_0, -4), /* *(u32 *)(fp - 4) = r0 */
BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -4), /* r2 = fp - 4 */
BPF_LD_MAP_FD(BPF_REG_1, map_fd),
BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 2),
BPF_MOV64_IMM(BPF_REG_1, 1), /* r1 = 1 */
BPF_RAW_INSN(BPF_STX | BPF_XADD | BPF_DW, BPF_REG_0, BPF_REG_1, 0, 0), /* xadd r0 += r1 */
BPF_MOV64_IMM(BPF_REG_0, 0), /* r0 = 0 */
BPF_EXIT_INSN(),
};
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git/tree/samples/bpf/sock_example.c
BPF Helper 함수:
$ grep BPF_CALL
kernel/bpf/helpers.c:
BPF_CALL_2(bpf_map_lookup_elem, struct bpf_map *, map, void *, key)
BPF_CALL_4(bpf_map_update_elem, struct bpf_map *, map, void *, key,
[...]
kernel/trace/bpf_trace.c:
BPF_CALL_2(bpf_override_return, struct pt_regs *, regs, unsigned long, rc)
BPF_CALL_3(bpf_probe_read, void *, dst, u32, size, const void *, unsafe_ptr)
BPF_CALL_3(bpf_probe_write_user, void *, unsafe_ptr, const void *, src,
BPF_CALL_5(bpf_trace_printk, char *, fmt, u32, fmt_size, u64, arg1,
[...]
net/core/filter.c:
BPF_CALL_1(bpf_skb_get_pay_offset, struct sk_buff *, skb)
BPF_CALL_3(bpf_skb_get_nlattr, struct sk_buff *, skb, u32, a, u32, x)
[...]
BPF as a kernel subproject
“Safe dynamic programs and tools”
$ cat MAINTAINERS | grep -A 3 BPF
BPF (Safe dynamic programs and tools)
M: Alexei Starovoitov <ast@kernel.org>
M: Daniel Borkmann <daniel@iogearbox.net>
L: netdev@vger.kernel.org
[...]
“Safe dynamic programs and tools”
$ cat MAINTAINERS | grep -A 27 BPF
BPF (Safe dynamic programs and tools)
[...]
F: arch/x86/net/bpf_jit*
[...]
F: kernel/bpf/
F: kernel/trace/bpf_trace.c
[...]
F: net/core/filter.c
F: net/sched/act_bpf.c
F: net/sched/cls_bpf.c
[...]
[...]
F: samples/bpf/
F: tools/bpf/
F: tools/lib/bpf/
F: tools/testing/selftests/bpf/
BPF as a kernel subproject
$ cat MAINTAINERS | grep -A 27 BPF
BPF (Safe dynamic programs and tools)
[...]
F: arch/x86/net/bpf_jit*
[...]
F: kernel/bpf/
F: kernel/trace/bpf_trace.c
[...]
F: net/core/filter.c
F: net/sched/act_bpf.c
F: net/sched/cls_bpf.c
[...]
[...]
F: samples/bpf/
F: tools/bpf/
F: tools/lib/bpf/
F: tools/testing/selftests/bpf/
JIT 지원 arch:
x86,
arm, arm64
sparc,
s390,
powerpc, mips
“Safe dynamic programs and tools”
BPF as a kernel subproject
“Safe dynamic programs and tools”
$ cat MAINTAINERS | grep -A 27 BPF
BPF (Safe dynamic programs and tools)
[...]
F: arch/x86/net/bpf_jit*
[...]
F: kernel/bpf/
F: kernel/trace/bpf_trace.c
[...]
F: net/core/filter.c
F: net/sched/act_bpf.c
F: net/sched/cls_bpf.c
[...]
[...]
F: samples/bpf/
F: tools/bpf/
F: tools/lib/bpf/
F: tools/testing/selftests/bpf/
BPF core:
Syscall,
Interpreter,
Verifier,
Generic Helpers,
Maps,
...
BPF as a kernel subproject
“Safe dynamic programs and tools”
$ cat MAINTAINERS | grep -A 27 BPF
BPF (Safe dynamic programs and tools)
[...]
F: arch/x86/net/bpf_jit*
[...]
F: kernel/bpf/
F: kernel/trace/bpf_trace.c
[...]
F: net/core/filter.c
F: net/sched/act_bpf.c
F: net/sched/cls_bpf.c
[...]
[...]
F: samples/bpf/
F: tools/bpf/
F: tools/lib/bpf/
F: tools/testing/selftests/bpf/
Hook points,
Specific Helpers
...
For cBPF, ...
BPF as a kernel subproject
“Safe dynamic programs and tools”
$ cat MAINTAINERS | grep -A 27 BPF
BPF (Safe dynamic programs and tools)
[...]
F: arch/x86/net/bpf_jit*
[...]
F: kernel/bpf/
F: kernel/trace/bpf_trace.c
[...]
F: net/core/filter.c
F: net/sched/act_bpf.c
F: net/sched/cls_bpf.c
[...]
[...]
F: samples/bpf/
F: tools/bpf/
F: tools/lib/bpf/
F: tools/testing/selftests/bpf/
bpf loading(lib),
bpf tool,
test codes,
samples,
...
BPF as a kernel subproject
BPF Infrastructure:
BPF프로그램 활용을 위한 지원
1) Hook points in-kernel callback point
2) Map user-to-kernel shared memory
3) helper를 통한 커널함수호출 leveraging
4) Object pinning /sys/fs/bpf/...
KERNEL SPACE
bpf() SYSCALL
BPF Controller 1
(User App)
ip tc
Map 1
(Shared memory)
Map 2
(Shared memory)
.
.
BPF
BPF
BPF
func(): Helper
func()
func()
func()
BPF library
in-iproute2
BPF Controller 2
(User App)
. . . . . .
BPF Architecture:
BPF library: libbpf
prog/map
load, attach, control
XDP
iptables는 충분히 빠른가요?
iptables는 왜 느릴까요?
iptables의 정책을 튜닝해본적 있으신가요?
XDP
(eXpress Data Path)
XDP == FAST PATH
NORMAL PATH
TX
APP
L7
RX
L3 input
TC Ingress
PREROUTING ROUTING
TCP/UDP
FORWARD ROUTING
INPUT OUTPUT
POSTROUTING
TC egress
L3 output
L4
L3
L3
DD
NORMAL PATH
TX
APP
L7
RX
L3 input
TC Ingress
PREROUTING ROUTING
TCP/UDP
FORWARD ROUTING
INPUT OUTPUT
POSTROUTING
TC egress
L3 output
L4
L3
L3
DD
XDP FAST PATH
REDIRECT
TX
APP
RX
L7
L4
L3
DD
BPF
Tutorial
준비물
1. 컴파일 컴퓨터 1대
2. 테스트 컴퓨터 1대(x86추천)
3. 커널 소스코드
4. clang + llvm(컴파일러)
5. bpftool(bpf 프로그램 로더)
6. bpf를 지원하는 iproute2 패키지
clang + llvm
컴파일러
git.kernel.org 의 bpf tree
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git
커널 소스코드
bpftool
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git/tree/tools/bpf/bpftool
BPF 프로그램 로더
iproute2
https://git.kernel.org/pub/scm/linux/kernel/git/dborkman/iproute2.git
XDP 설정도구
kernel source code 및 bpf sample code
samples/bpf
예제
kernel소스 내 sample code 분석
samples/bpf
예제(xdp_rxq_info_kern.c)
BPF 프로그램 컴파일 실습
samples/bpf
컴파일
$ mount bpffs /sys/fs/bpf -t bpf
$ bpftool prog load ./xdp_rxq_info_kern.o /sys/fs/bpf/xdp
프로그램 로드
$ ls /sys/fs/bpf/
$ ./bpftool prog list
$ ./bpftool prog dump xlated id X
jited
프로그램 확인
$ ip link set dev lo xdp pin /sys/fs/bpf/xdp
XDP프로그램 설정
$ ip link show dev lo
XDP프로그램 설정 확인
$ ip link set dev lo xdp off
$ rm /sys/fs/bpf/xdp
XDP프로그램 설정 제거
iptables vs XDP
TEST NETWORK
PC2
192.168.4.2
PC1
192.168.4.1
ICMP
$ ping
iptables를 사용하여 패킷을 버리기
DROP
#PC2
$ ping 192.168.4.1
#PC1
$ iptables -A INPUT -s 192.168.4.2 -d 192.168.4.1 -p icmp 
-j DROP
NORMAL PATH
TX
APP
L7
RX
L3 input
TC Ingress
PREROUTING ROUTING
TCP/UDP
FORWARD ROUTING
INPUT OUTPUT
POSTROUTING
TC egress
L3 output
L4
L3
L3
DD
NORMAL PATH
TX
APP
L7
RX
L3 input
TC Ingress
PREROUTING ROUTING
TCP/UDP
FORWARD ROUTING
INPUT OUTPUT
POSTROUTING
TC egress
L3 output
L4
L3
L3
DD
DROP
XDP를 사용하여 패킷을 버리기
DROP
$ ./bpftool prog load ./xdp_icmp.o /sys/fs/bpf/xdp_icmp
$ ip link set dev lo xdp pin /sys/fs/bpf/xdp_icmp
XDP프로그램 설정 제거
XDP GENERIC PATH
TX
APP
L7
RX
BPF
TC Ingress
PREROUTING ROUTING
TCP/UDP
FORWARD ROUTING
INPUT OUTPUT
POSTROUTING
TC egress
L3 output
L4
L3
L3
DD
DROP
BPF Tracing
iptables path VS XDP path
netif_receive_skb_internal()
ipt_do_table()
DROP
BPF Tracing:
iptables - DROP case
netif_receive_skb_internal()
ipt_do_table()
Long time !! ~~
DROP
BPF Tracing:
iptables - DROP case
netif_receive_skb_internal()
do_xdp_generic()
DROP
BPF Tracing:
XDP - DROP case
netif_receive_skb_internal()
do_xdp_generic()
Short time !! ~~
DROP
BPF Tracing:
XDP - DROP case
netif_receive_skb_internal()
ipt_do_table()
do_xdp_generic()
Short time !! ~~
BPF Tracing:
iptables vs XDP - DROP case
DROP
DROP
Long time !! ~~
BPF Tracing:
iptables vs XDP - DROP case
net/core/dev.c:
static int netif_receive_skb_internal(struct sk_buff *skb)
net/core/dev.c:
int do_xdp_generic(struct bpf_prog *xdp_prog, struct sk_buff *skb)
net/ipv4/netfilter/ip_tables.c:
unsigned int ipt_do_table(struct sk_buff *skb, ...)
DROP
DROP
net/core/dev.c:
static int netif_receive_skb_internal(struct sk_buff *skb)
net/core/dev.c:
int do_xdp_generic(struct bpf_prog *xdp_prog, struct sk_buff *skb)
net/ipv4/netfilter/ip_tables.c:
unsigned int ipt_do_table(struct sk_buff *skb, ...)
BPF Tracing:
iptables vs XDP - DROP case
net/core/dev.c:
static int netif_receive_skb_internal(struct sk_buff *skb)
net/core/dev.c:
int do_xdp_generic(struct bpf_prog *xdp_prog, struct sk_buff *skb)
net/ipv4/netfilter/ip_tables.c:
unsigned int ipt_do_table(struct sk_buff *skb, ...)
BPF Tracing:
iptables vs XDP - DROP case
BPF
BPF
Beginning point: BPF ATTACH !!
BPF
Return point: BPF ATTACH !!
Return point: BPF ATTACH !!
net/core/dev.c:
static int netif_receive_skb_internal(struct sk_buff *skb)
net/core/dev.c:
int do_xdp_generic(struct bpf_prog *xdp_prog, struct sk_buff *skb)
net/ipv4/netfilter/ip_tables.c:
unsigned int ipt_do_table(struct sk_buff *skb, ...)
BPF Tracing:
iptables vs XDP - DROP case
BPF
BPF
BPFSEC("kprobe/netif_receive_skb_internal")
int bpf_trace_receive_skb(struct pt_regs *ctx)
{
long skb_ptr = PT_REGS_PARM1(ctx);
u64 start_time = bpf_ktime_get_ns();
bpf_map_update_elem(&tracing_map, &skb_ptr, &start_time,
BPF_ANY);
return 0;
}
BPF
BPF
net/core/dev.c:
static int netif_receive_skb_internal(struct sk_buff *skb)
net/core/dev.c:
int do_xdp_generic(struct bpf_prog *xdp_prog, struct sk_buff *skb)
net/ipv4/netfilter/ip_tables.c:
unsigned int ipt_do_table(struct sk_buff *skb, ...)
BPF Tracing:
iptables vs XDP - DROP case
BPF
SEC("kprobe/netif_receive_skb_internal")
int bpf_trace_receive_skb(struct pt_regs *ctx)
{
long skb_ptr = PT_REGS_PARM1(ctx);
u64 start_time = bpf_ktime_get_ns();
bpf_map_update_elem(&tracing_map, &skb_ptr, &start_time,
BPF_ANY);
return 0;
}
BPF
BPF
net/core/dev.c:
static int netif_receive_skb_internal(struct sk_buff *skb)
net/core/dev.c:
int do_xdp_generic(struct bpf_prog *xdp_prog, struct sk_buff *skb)
net/ipv4/netfilter/ip_tables.c:
unsigned int ipt_do_table(struct sk_buff *skb, ...)
BPF Tracing:
iptables vs XDP - DROP case
BPF
SEC("kretprobe/do_xdp_generic")
int bpf_trace_xdp_drop(struct pt_regs *ctx)
{
long skb_ptr = PT_REGS_PARM2(ctx);
int action = PT_REGS_RC(ctx);
if (action == XDP_DROP) {
u64 *time = bpf_map_lookup_elem(&tracing_map, &skb_ptr);
u64 cur_time = bpf_ktime_get_ns();
u64 delta = cur_time - tr->time;
*time = delta;
...
BPF
BPF
net/core/dev.c:
static int netif_receive_skb_internal(struct sk_buff *skb)
net/core/dev.c:
int do_xdp_generic(struct bpf_prog *xdp_prog, struct sk_buff *skb)
net/ipv4/netfilter/ip_tables.c:
unsigned int ipt_do_table(struct sk_buff *skb, ...)
BPF Tracing:
iptables vs XDP - DROP case
BPF
SEC("kretprobe/do_xdp_generic")
int bpf_trace_xdp_drop(struct pt_regs *ctx)
{
long skb_ptr = PT_REGS_PARM2(ctx);
int action = PT_REGS_RC(ctx);
if (action == XDP_DROP) {
u64 *time = bpf_map_lookup_elem(&tracing_map, &skb_ptr);
u64 cur_time = bpf_ktime_get_ns();
u64 delta = cur_time - tr->time;
*time = delta;
...
BPF
BPF
net/core/dev.c:
static int netif_receive_skb_internal(struct sk_buff *skb)
net/core/dev.c:
int do_xdp_generic(struct bpf_prog *xdp_prog, struct sk_buff *skb)
net/ipv4/netfilter/ip_tables.c:
unsigned int ipt_do_table(struct sk_buff *skb, ...)
BPF Tracing:
iptables vs XDP - DROP case
BPF
SEC("kretprobe/ipt_do_table")
int bpf_trace_iptables_drop(struct pt_regs *ctx)
{
long skb_ptr = PT_REGS_PARM1(ctx);
int action = PT_REGS_RC(ctx);
if (action == NF_DROP) {
u64 *time = bpf_map_lookup_elem(&tracing_map, &skb_ptr);
u64 cur_time = bpf_ktime_get_ns();
u64 delta = cur_time - tr->time;
*time = delta;
...
BPF
BPF
net/core/dev.c:
static int netif_receive_skb_internal(struct sk_buff *skb)
net/core/dev.c:
int do_xdp_generic(struct bpf_prog *xdp_prog, struct sk_buff *skb)
net/ipv4/netfilter/ip_tables.c:
unsigned int ipt_do_table(struct sk_buff *skb, ...)
BPF Tracing:
iptables vs XDP - DROP case
BPF
SEC("kretprobe/ipt_do_table")
int bpf_trace_iptables_drop(struct pt_regs *ctx)
{
long skb_ptr = PT_REGS_PARM1(ctx);
int action = PT_REGS_RC(ctx);
if (action == NF_DROP) {
u64 *time = bpf_map_lookup_elem(&tracing_map, &skb_ptr);
u64 cur_time = bpf_ktime_get_ns();
u64 delta = cur_time - tr->time;
*time = delta;
...
Ftrace Tracing
iptables path VS XDP path
$ cat /sys/kernel/debug/tracing/trace
netif_receive_skb_internal() {
ktime_get_with_offset();
__netif_receive_skb() {
__netif_receive_skb_core() {
ip_rcv() {
pskb_trim_rcsum_slow();
nf_hook_slow() {
iptable_mangle_hook() {
ipt_do_table() {
__local_bh_enable_ip();
}
}
}
ip_rcv_finish() {
udp_v4_early_demux();
ip_route_input_noref() {
ip_route_input_rcu() {
ip_route_input_slow() {
fib_table_lookup();
fib_validate_source() {
__fib_validate_source() {
fib_table_lookup();
}
}
}
}
}
ip_local_deliver() {
nf_hook_slow() {
iptable_mangle_hook() {
ipt_do_table() {
__local_bh_enable_ip();
}
}
iptable_filter_hook() {
ipt_do_table() {
udp_mt();
__local_bh_enable_ip();
}
}
kfree_skb()
$ cat /sys/kernel/debug/tracing/trace
netif_receive_skb_internal() {
ktime_get_with_offset();
__netif_receive_skb() {
__netif_receive_skb_core() {
ip_rcv() {
pskb_trim_rcsum_slow();
nf_hook_slow() {
iptable_mangle_hook() {
ipt_do_table() {
__local_bh_enable_ip();
}
}
}
ip_rcv_finish() {
udp_v4_early_demux();
ip_route_input_noref() {
ip_route_input_rcu() {
ip_route_input_slow() {
fib_table_lookup();
fib_validate_source() {
__fib_validate_source() {
fib_table_lookup();
}
}
}
}
}
ip_local_deliver() {
nf_hook_slow() {
iptable_mangle_hook() {
ipt_do_table() {
__local_bh_enable_ip();
}
}
iptable_filter_hook() {
ipt_do_table() {
udp_mt();
__local_bh_enable_ip();
}
}
kfree_skb()
DROP
$ cat /sys/kernel/debug/tracing/trace
netif_receive_skb_internal() {
ktime_get_with_offset();
__netif_receive_skb() {
__netif_receive_skb_core() {
ip_rcv() {
pskb_trim_rcsum_slow();
nf_hook_slow() {
iptable_mangle_hook() {
ipt_do_table() {
__local_bh_enable_ip();
}
}
}
ip_rcv_finish() {
udp_v4_early_demux();
ip_route_input_noref() {
ip_route_input_rcu() {
ip_route_input_slow() {
fib_table_lookup();
fib_validate_source() {
__fib_validate_source() {
fib_table_lookup();
}
}
}
}
}
ip_local_deliver() {
nf_hook_slow() {
iptable_mangle_hook() {
ipt_do_table() {
__local_bh_enable_ip();
}
}
iptable_filter_hook() {
ipt_do_table() {
udp_mt();
__local_bh_enable_ip();
}
}
kfree_skb()
netif_receive_skb_internal() {
ktime_get_with_offset();
do_xdp_generic() {
pskb_expand_head() {
__kmalloc_reserve.isra.48() {
__kmalloc_node_track_caller() {
kmalloc_slab();
should_failslab();
}
}
ksize();
skb_free_head() {
page_frag_free();
}
skb_headers_offset_update();
}
__bpf_prog_run32() {
___bpf_prog_run();
}
kfree_skb()
DROP
DROP
$ cat /sys/kernel/debug/tracing/trace
netif_receive_skb_internal() {
ktime_get_with_offset();
__netif_receive_skb() {
__netif_receive_skb_core() {
ip_rcv() {
pskb_trim_rcsum_slow();
nf_hook_slow() {
iptable_mangle_hook() {
ipt_do_table() {
__local_bh_enable_ip();
}
}
}
ip_rcv_finish() {
udp_v4_early_demux();
ip_route_input_noref() {
ip_route_input_rcu() {
ip_route_input_slow() {
fib_table_lookup();
fib_validate_source() {
__fib_validate_source() {
fib_table_lookup();
}
}
}
}
}
ip_local_deliver() {
nf_hook_slow() {
iptable_mangle_hook() {
ipt_do_table() {
__local_bh_enable_ip();
}
}
iptable_filter_hook() {
ipt_do_table() {
udp_mt();
__local_bh_enable_ip();
}
}
kfree_skb()
netif_receive_skb_internal() {
ktime_get_with_offset();
do_xdp_generic() {
pskb_expand_head() {
__kmalloc_reserve.isra.48() {
__kmalloc_node_track_caller() {
kmalloc_slab();
should_failslab();
}
}
ksize();
skb_free_head() {
page_frag_free();
}
skb_headers_offset_update();
}
__bpf_prog_run32() {
___bpf_prog_run();
}
kfree_skb()
DROP
DROP
$ cat /sys/kernel/debug/tracing/trace
netif_receive_skb_internal() {
ktime_get_with_offset();
__netif_receive_skb() {
__netif_receive_skb_core() {
ip_rcv() {
pskb_trim_rcsum_slow();
nf_hook_slow() {
iptable_mangle_hook() {
ipt_do_table() {
__local_bh_enable_ip();
}
}
}
ip_rcv_finish() {
udp_v4_early_demux();
ip_route_input_noref() {
ip_route_input_rcu() {
ip_route_input_slow() {
fib_table_lookup();
fib_validate_source() {
__fib_validate_source() {
fib_table_lookup();
}
}
}
}
}
ip_local_deliver() {
nf_hook_slow() {
iptable_mangle_hook() {
ipt_do_table() {
__local_bh_enable_ip();
}
}
iptable_filter_hook() {
ipt_do_table() {
udp_mt();
__local_bh_enable_ip();
}
}
kfree_skb()
netif_receive_skb_internal() {
ktime_get_with_offset();
do_xdp_generic() {
pskb_expand_head() {
__kmalloc_reserve.isra.48() {
__kmalloc_node_track_caller() {
kmalloc_slab();
should_failslab();
}
}
ksize();
skb_free_head() {
page_frag_free();
}
skb_headers_offset_update();
}
__bpf_prog_run32() {
___bpf_prog_run();
}
kfree_skb()
DROP
DROP
YOU WIN !!
“XDP is LOVE”
BPF internals
BPF Infrastructure:
1) Hook points in-kernel callback point
2) LOAD ATTACH CALLBACK
3) Verifier / Interpreter / JIT
4) Map user-to-kernel shared memory
5) helper를 통한 커널함수호출 leveraging
6) Object pinning /sys/fs/bpf/…
...
Hook points: callback points
KERNEL SPACE
XDP: L2 device driver 지점
tc: L3 DD 직전 / 직후 지점 kprobe: 함수 Entry / Return
. . .
. . .
Hook points: callback points
KERNEL SPACE
XDP: L2 device driver 지점
tc: L3 DD 직전 / 직후 지점 kprobe: 함수 Entry / Return
. . .
. . .if (has_bpf_prog)
BPF_PROG_RUN();
->bpf_func(ctx, insni);
특정 커널 함수 안에
Hook points: callback points
KERNEL SPACE
XDP: L2 device driver 지점
kprobe: 함수 Entry / Return
. . .
. . .
BPF
BPF
BPF
BPF prog injection !!
tc: L3 DD 직전 / 직후 지점if (has_bpf_prog)
BPF_PROG_RUN();
->bpf_func(ctx, insni);
특정 커널 함수 안에
Hook points: callback points
KERNEL SPACE
XDP: L2 device driver 지점
kprobe: 함수 Entry / Return
. . .
. . .
BPF
BPF
BPF
BPF prog injection !!
tc: L3 DD 직전 / 직후 지점if (has_bpf_prog)
BPF_PROG_RUN();
->bpf_func(ctx, insni);
BPF Interpreter
또는
JIT 된 머신코드
특정 커널 함수 안에
Hook points: callback points
KERNEL SPACE
XDP: L2 device driver 지점
kprobe: 함수 Entry / Return
. . .
. . .
BPF
BPF
BPF
BPF prog injection !!
HOW ?
tc: L3 DD 직전 / 직후 지점
KERNEL SPACE
tc ip
BPF library
in-iproute2
bpf() SYSCALL
c소스 _kern.c
clang / llc 컴파일
BPF 프로그램
Or
BPF bytecode
BPF
elf
KERNEL SPACE
tc ip
BPF library
in-iproute2
bpf() SYSCALL
c소스 _kern.c
clang / llc 컴파일
BPF 프로그램
Or
BPF bytecode
BPF
elf
1. ELF parsing,
2. 1차 Relocation:
1) map fd
2) bpf to bpf call
KERNEL SPACE
tc ip
BPF library
in-iproute2
bpf() SYSCALL
c소스 _kern.c
clang / llc 컴파일
BPF 프로그램
Or
BPF bytecode
BPF
elf
1. ELF parsing,
2. 1차 Relocation:
1) map fd
2) bpf to bpf call
Map 1
(Shared memory)
KERNEL SPACE
tc ip
BPF library
in-iproute2
bpf() SYSCALL
c소스 _kern.c
clang / llc 컴파일
BPF 프로그램
Or
BPF bytecode
BPF
elf
1. ELF parsing,
2. 1차 Relocation:
1) map fd
2) bpf to bpf call
3. Loading
BPF
BPF
BPF prog injection !!
BPF_PROG_LOAD
Map 1
(Shared memory)
KERNEL SPACE
tc ip
BPF library
in-iproute2
bpf() SYSCALL
c소스 _kern.c
clang / llc 컴파일
BPF 프로그램
Or
BPF bytecode
BPF
elf
1. ELF parsing,
2. 1차 Relocation:
1) map fd
2) bpf to bpf call
3. Loading
BPF
BPF
BPF prog injection !!
BPF Controller
(User App)
BPF library: libbpf
prog/map
load, attach, control
. . .
Map 1
(Shared memory)
KERNEL SPACE
tc ip
BPF library
in-iproute2
bpf() SYSCALL
c소스 _kern.c
clang / llc 컴파일
BPF 프로그램
Or
BPF bytecode
BPF
elf
1. ELF parsing,
2. 1차 Relocation:
1) map fd
2) bpf to bpf call
3. Loading
BPF
BPF
BPF prog injection !!
BPF Controller
(User App)
BPF library: libbpf
prog/map
load, attach, control
. . .
HOW ? in bpf()
Map 1
(Shared memory)
KERNEL SPACE
tc ip
BPF library
in-iproute2
bpf() SYSCALL
c소스 _kern.c
clang / llc 컴파일
BPF 프로그램
Or
BPF bytecode
BPF
elf
1. ELF parsing,
2. 1차 Relocation:
1) map fd
2) bpf to bpf call
3. Loading
BPF Controller
(User App)
BPF library: libbpf
prog/map
load, attach, control
. . .
BPF LOAD 과정:
1. BPF prog / map alloc
2. Verifier (loop, mem access 범위)
Map 1
(Shared memory)
KERNEL SPACE
tc ip
BPF library
in-iproute2
bpf() SYSCALL
c소스 _kern.c
clang / llc 컴파일
BPF 프로그램
Or
BPF bytecode
BPF
elf
1. ELF parsing,
2. 1차 Relocation:
1) map fd
2) bpf to bpf call
3. Loading
BPF Controller
(User App)
BPF library: libbpf
prog/map
load, attach, control
. . .
BPF LOAD 과정:
1. BPF prog / map alloc
2. Verifier (loop, mem access 범위)
3. 2차 Relocation:
1) map fd → map ptr
2) helper ID → func addr
Map 1
(Shared memory)
KERNEL SPACE
tc ip
BPF library
in-iproute2
bpf() SYSCALL
c소스 _kern.c
clang / llc 컴파일
BPF 프로그램
Or
BPF bytecode
BPF
elf
1. ELF parsing,
2. 1차 Relocation:
1) map fd
2) bpf to bpf call
3. Loading
BPF Controller
(User App)
BPF library: libbpf
prog/map
load, attach, control
. . .
BPF LOAD 과정:
1. BPF prog / map alloc
2. Verifier (loop, mem access 범위)
3. 2차 Relocation:
1) map fd → map ptr
2) helper ID → func addr
4. select runtime:
1) BPF interpreter func addr
2) JIT 후 BPF func addr
return fd;
Map 1
(Shared memory)
KERNEL SPACE
tc ip
BPF library
in-iproute2
bpf() SYSCALL
c소스 _kern.c
clang / llc 컴파일
BPF 프로그램
Or
BPF bytecode
BPF
elf
1. ELF parsing,
2. 1차 Relocation:
1) map fd
2) bpf to bpf call
3. Loading
BPF Controller
(User App)
BPF library: libbpf
prog/map
load, attach, control
. . .
BPF LOAD 과정:
1. BPF prog / map alloc
2. Verifier (loop, mem access 범위)
3. 2차 Relocation:
1) map fd → map ptr
2) helper ID → func addr
4. select runtime:
1) BPF interpreter func addr
2) JIT 후 BPF func addr
if (has_bpf_prog)
BPF_PROG_RUN();
->bpf_func(ctx, insni);
return fd;
KERNEL SPACE
tc ip
BPF library
in-iproute2
bpf() SYSCALL
c소스 _kern.c
clang / llc 컴파일
BPF 프로그램
Or
BPF bytecode
BPF
elf
1. ELF parsing,
2. 1차 Relocation:
1) map fd
2) bpf to bpf call
3. Loading
BPF
BPF
BPF Controller
(User App)
BPF library: libbpf
prog/map
load, attach, control
. . .
Map 1
(Shared memory)
Map 2
(Shared memory)
.
.
.
다양한 BPF ATTACH 방식:
- sock(), send() AF_NETLINK
- bpf() syscall BPF_PROG_ATTACH
BPF_RAW_TRACEPOINT_OPEN
- kprobe event id, ioctl()
PERF_EVENT_IOC_SET_BPF
...
KERNEL SPACE
tc ip
BPF library
in-iproute2
bpf() SYSCALL
BPF
BPF
BPF Controller
(User App)
BPF library: libbpf
prog/map
load, attach, control
. . .
Map 1
(Shared memory)
Map 2
(Shared memory)
.
.
.
BPF
Callback !!
Callback !!
BPF CALLBACK !!
KERNEL SPACE
tc ip
BPF library
in-iproute2
bpf() SYSCALL
BPF
BPF
BPF Controller
(User App)
BPF library: libbpf
prog/map
load, attach, control
. . .
Map 1
(Shared memory)
Map 2
(Shared memory)
.
.
.
BPF
func(): Helper
func()
func()
func()
BPF Helper 함수를 통한 커널함수 호출 leveraging
!!
KERNEL SPACE
tc ip
BPF library
in-iproute2
bpf() SYSCALL
BPF
BPF
. . .
Map 1
(Shared memory)
Map 2
(Shared memory)
.
.
.
BPF
func(): Helper
func()
func()
func()
BPF Controller 1
(User App)
BPF library: libbpf
prog/map
load, attach, control
BPF Controller 2
(User App)
BPF map 을 통한 user to kernel memory shared
KERNEL SPACE
bpf() SYSCALL
BPF Controller 1
(User App)
ip tc
Map 1
(Shared memory)
Map 2
(Shared memory)
.
.
BPF
BPF
BPF
func(): Helper
func()
func()
func()
BPF library
in-iproute2
BPF Controller 2
(User App)
. . . . . .
BPF Architecture:
BPF library: libbpf
prog/map
load, attach, control
XDP internals
XDP_ABORT
XDP_DROP
XDP_PASS
XDP_TX
XDP_REDIRECT
XDP RETURN TYPE
XDP_REDIRECT
XDP_TX
XDP_PASS
BPF
APP
XDP_DROP
Network Device Driver
Generic XDP
vs
Driver XDP
XDP GENERIC PATH
TX
APP
L7
RX
BPF
TC Ingress
PREROUTING ROUTING
TCP/UDP
FORWARD ROUTING
INPUT OUTPUT
POSTROUTING
TC egress
L3 output
L4
L3
L3
DD
XDP GENERIC PATH
TX
APP
L7
RX
BPF
TC Ingress
PREROUTING ROUTING
TCP/UDP
FORWARD ROUTING
INPUT OUTPUT
POSTROUTING
TC egress
L3 output
L4
L3
L3
DD
DRIVER XDP PATH
REDIRECT
TX
APP
RX
L7
L4
L3
L2
PASS
BPF
DRIVER XDP PATH
REDIRECT
TX
APP
RX
L7
L4
L3
L2
PASS
BPF
Driver XDP vs Generic XDP
REDIRECT
TX
RX
PASS
BPF
REDIRECT
TX
RX
L3
BPF
PASS
XDP 자료구조와 SKB
xdp->data
HEADROOM
MAC
HEADER
IP
HEADER
TAIL/
TAILROOM
END
skb->data
xdp->data_hard_start
xdp->data_meta
xdp_frame
DATA ACCESS 허용범위
xdp->data
HEADROOM
MAC
HEADER
IP
HEADER
TAIL/
TAILROOM
END
xdp->data_meta
xdp->data_hard_start
xdp->data
HEADROOM
MAC
HEADER
IP
HEADER
TAIL/
TAILROOM
END
xdp->data_meta
xdp->data_hard_start
XDP_REDIRECT분석
XDP_REDIRECTBPF
APP
eth0 eth1 eth2 eth3
XDP_TX
REDIRECT MAP
XDP_REDIRECTBPF
APP
eth0 eth1 eth2 eth3
XDP_TX
REDIRECT MAP
XDP_REDIRECTBPF
APP
eth0 eth1 eth2 eth3
XDP_TX
REDIRECT MAP
bpf_redirect()통한 XDP_REDIRECT
bpf_redirect()에 대해
XDP_REDIRECTBPF
APP
eth0 eth1 eth2 eth3
XDP_TX
REDIRECT MAP
XDP_REDIRECTBPF
APP
eth0 eth1 eth2 eth3
XDP_TX
REDIRECT MAP
XDP_REDIRECT - bulkTX
bulkTX
REDIRECT
TX
RX
BPF
xdp_frame
xdp_frame
xdp_frame
xdp_frame
xdp_frame
xdp_frame
xdp_frame
xdp_frame
map
DEVMAP
DEVMAP
REDIRECT
TX
RX
BPF
xdp_frame
DEVMAP
redirect info
bpf_redirect_map
Key Value(Device)
0 X
1 X
2 X
xdp_frame
xdp_frame
xdp_frame
xdp_frame
xdp_frame
xdp_frame
xdp_frame
xdp_frame
xdp_frame
xdp_frame
xdp_frame
xdp_frame
xdp_frame
xdp_frame
xdp_frame
CPUMAP
CPUMAP
REDIRECT
???
RX
BPF
xdp_frame
CPUMAP
redirect info
bpf_redirect_map
Key Value(CPU)
0 X
1 X
2 X
xdp_frame
xdp_frame
xdp_frame
xdp_frame
xdp_frame
xdp_frame
xdp_frame
CPUMAP
REDIRECT
netif_receive_skb_core
RX
BPF
xdp_frame
CPUMAP
redirect info
bpf_redirect_map
Key Value(CPU)
0 X
1 X
2 X
xdp_frame
xdp_frame
xdp_frame
xdp_frame
xdp_frame
xdp_frame
xdp_frame
GENERIC_XDP의 REDIRECT
BPFILTER
● memory model switching
○ /net/core/xdp.c
● page pool
○ /net/core/page_pool
● offload
● AF_XDP && XSK(XDP SOCKET)
● helper functions
● Device Driver
Additional Topics:
● Verifier
○ CFG, DAG, register, memory check...
● Other types
○ TC, SOCKET FILTER, CGROUP
● BTF
○ ELFutils, clang -g, llc -mattr=dwarfris
● Tail call
○ bpf_prog_array 연관
Additional Topics:
● FACEBOOK’s Katran
○ L4 Load-balancing
○ https://github.com/facebookincubator/katran
● Suricata
○ IPD/IDS engine
○ https://suricata-ids.org/
● Cilium
○ https://cilium.io/
● IOvisor bcc
○ https://www.iovisor.org/
● IR Decoding
○ https://lwn.net/Articles/759188/
Additional Topics:

More Related Content

What's hot

High-Performance Networking Using eBPF, XDP, and io_uring
High-Performance Networking Using eBPF, XDP, and io_uringHigh-Performance Networking Using eBPF, XDP, and io_uring
High-Performance Networking Using eBPF, XDP, and io_uring
ScyllaDB
 
Meet cute-between-ebpf-and-tracing
Meet cute-between-ebpf-and-tracingMeet cute-between-ebpf-and-tracing
Meet cute-between-ebpf-and-tracing
Viller Hsiao
 
Performance Wins with eBPF: Getting Started (2021)
Performance Wins with eBPF: Getting Started (2021)Performance Wins with eBPF: Getting Started (2021)
Performance Wins with eBPF: Getting Started (2021)
Brendan Gregg
 
EBPF and Linux Networking
EBPF and Linux NetworkingEBPF and Linux Networking
EBPF and Linux Networking
PLUMgrid
 
eBPF Basics
eBPF BasicseBPF Basics
eBPF Basics
Michael Kehoe
 
DockerCon 2017 - Cilium - Network and Application Security with BPF and XDP
DockerCon 2017 - Cilium - Network and Application Security with BPF and XDPDockerCon 2017 - Cilium - Network and Application Security with BPF and XDP
DockerCon 2017 - Cilium - Network and Application Security with BPF and XDP
Thomas Graf
 
BPF - in-kernel virtual machine
BPF - in-kernel virtual machineBPF - in-kernel virtual machine
BPF - in-kernel virtual machine
Alexei Starovoitov
 
UM2019 Extended BPF: A New Type of Software
UM2019 Extended BPF: A New Type of SoftwareUM2019 Extended BPF: A New Type of Software
UM2019 Extended BPF: A New Type of Software
Brendan Gregg
 
eBPF - Observability In Deep
eBPF - Observability In DeepeBPF - Observability In Deep
eBPF - Observability In Deep
Mydbops
 
Understanding eBPF in a Hurry!
Understanding eBPF in a Hurry!Understanding eBPF in a Hurry!
Understanding eBPF in a Hurry!
Ray Jenkins
 
Linux BPF Superpowers
Linux BPF SuperpowersLinux BPF Superpowers
Linux BPF Superpowers
Brendan Gregg
 
BPF: Tracing and more
BPF: Tracing and moreBPF: Tracing and more
BPF: Tracing and more
Brendan Gregg
 
Introduction to eBPF
Introduction to eBPFIntroduction to eBPF
Introduction to eBPF
RogerColl2
 
BPF Internals (eBPF)
BPF Internals (eBPF)BPF Internals (eBPF)
BPF Internals (eBPF)
Brendan Gregg
 
Linux Network Stack
Linux Network StackLinux Network Stack
Linux Network Stack
Adrien Mahieux
 
eBPF Workshop
eBPF WorkshopeBPF Workshop
eBPF Workshop
Michael Kehoe
 
eBPF/XDP
eBPF/XDP eBPF/XDP
eBPF/XDP
Netronome
 
Understanding DPDK
Understanding DPDKUnderstanding DPDK
Understanding DPDK
Denys Haryachyy
 
Cilium - Container Networking with BPF & XDP
Cilium - Container Networking with BPF & XDPCilium - Container Networking with BPF & XDP
Cilium - Container Networking with BPF & XDP
Thomas Graf
 
DoS and DDoS mitigations with eBPF, XDP and DPDK
DoS and DDoS mitigations with eBPF, XDP and DPDKDoS and DDoS mitigations with eBPF, XDP and DPDK
DoS and DDoS mitigations with eBPF, XDP and DPDK
Marian Marinov
 

What's hot (20)

High-Performance Networking Using eBPF, XDP, and io_uring
High-Performance Networking Using eBPF, XDP, and io_uringHigh-Performance Networking Using eBPF, XDP, and io_uring
High-Performance Networking Using eBPF, XDP, and io_uring
 
Meet cute-between-ebpf-and-tracing
Meet cute-between-ebpf-and-tracingMeet cute-between-ebpf-and-tracing
Meet cute-between-ebpf-and-tracing
 
Performance Wins with eBPF: Getting Started (2021)
Performance Wins with eBPF: Getting Started (2021)Performance Wins with eBPF: Getting Started (2021)
Performance Wins with eBPF: Getting Started (2021)
 
EBPF and Linux Networking
EBPF and Linux NetworkingEBPF and Linux Networking
EBPF and Linux Networking
 
eBPF Basics
eBPF BasicseBPF Basics
eBPF Basics
 
DockerCon 2017 - Cilium - Network and Application Security with BPF and XDP
DockerCon 2017 - Cilium - Network and Application Security with BPF and XDPDockerCon 2017 - Cilium - Network and Application Security with BPF and XDP
DockerCon 2017 - Cilium - Network and Application Security with BPF and XDP
 
BPF - in-kernel virtual machine
BPF - in-kernel virtual machineBPF - in-kernel virtual machine
BPF - in-kernel virtual machine
 
UM2019 Extended BPF: A New Type of Software
UM2019 Extended BPF: A New Type of SoftwareUM2019 Extended BPF: A New Type of Software
UM2019 Extended BPF: A New Type of Software
 
eBPF - Observability In Deep
eBPF - Observability In DeepeBPF - Observability In Deep
eBPF - Observability In Deep
 
Understanding eBPF in a Hurry!
Understanding eBPF in a Hurry!Understanding eBPF in a Hurry!
Understanding eBPF in a Hurry!
 
Linux BPF Superpowers
Linux BPF SuperpowersLinux BPF Superpowers
Linux BPF Superpowers
 
BPF: Tracing and more
BPF: Tracing and moreBPF: Tracing and more
BPF: Tracing and more
 
Introduction to eBPF
Introduction to eBPFIntroduction to eBPF
Introduction to eBPF
 
BPF Internals (eBPF)
BPF Internals (eBPF)BPF Internals (eBPF)
BPF Internals (eBPF)
 
Linux Network Stack
Linux Network StackLinux Network Stack
Linux Network Stack
 
eBPF Workshop
eBPF WorkshopeBPF Workshop
eBPF Workshop
 
eBPF/XDP
eBPF/XDP eBPF/XDP
eBPF/XDP
 
Understanding DPDK
Understanding DPDKUnderstanding DPDK
Understanding DPDK
 
Cilium - Container Networking with BPF & XDP
Cilium - Container Networking with BPF & XDPCilium - Container Networking with BPF & XDP
Cilium - Container Networking with BPF & XDP
 
DoS and DDoS mitigations with eBPF, XDP and DPDK
DoS and DDoS mitigations with eBPF, XDP and DPDKDoS and DDoS mitigations with eBPF, XDP and DPDK
DoS and DDoS mitigations with eBPF, XDP and DPDK
 

Similar to BPF / XDP 8월 세미나 KossLab

Efficient System Monitoring in Cloud Native Environments
Efficient System Monitoring in Cloud Native EnvironmentsEfficient System Monitoring in Cloud Native Environments
Efficient System Monitoring in Cloud Native Environments
Gergely Szabó
 
eBPF Tooling and Debugging Infrastructure
eBPF Tooling and Debugging InfrastructureeBPF Tooling and Debugging Infrastructure
eBPF Tooling and Debugging Infrastructure
Netronome
 
ebpf and IO Visor: The What, how, and what next!
ebpf and IO Visor: The What, how, and what next!ebpf and IO Visor: The What, how, and what next!
ebpf and IO Visor: The What, how, and what next!
Affan Syed
 
eBPF Debugging Infrastructure - Current Techniques
eBPF Debugging Infrastructure - Current TechniqueseBPF Debugging Infrastructure - Current Techniques
eBPF Debugging Infrastructure - Current Techniques
Netronome
 
DCSF 19 eBPF Superpowers
DCSF 19 eBPF SuperpowersDCSF 19 eBPF Superpowers
DCSF 19 eBPF Superpowers
Docker, Inc.
 
DISTRIBUTED PERFORMANCE ANALYSIS USING INFLUXDB AND THE LINUX EBPF VIRTUAL MA...
DISTRIBUTED PERFORMANCE ANALYSIS USING INFLUXDB AND THE LINUX EBPF VIRTUAL MA...DISTRIBUTED PERFORMANCE ANALYSIS USING INFLUXDB AND THE LINUX EBPF VIRTUAL MA...
DISTRIBUTED PERFORMANCE ANALYSIS USING INFLUXDB AND THE LINUX EBPF VIRTUAL MA...
InfluxData
 
Beagleboard xm-setup
Beagleboard xm-setupBeagleboard xm-setup
Beagleboard xm-setup
Premjith Achemveettil
 
Systems@Scale 2021 BPF Performance Getting Started
Systems@Scale 2021 BPF Performance Getting StartedSystems@Scale 2021 BPF Performance Getting Started
Systems@Scale 2021 BPF Performance Getting Started
Brendan Gregg
 
Berkeley Packet Filters
Berkeley Packet FiltersBerkeley Packet Filters
Berkeley Packet Filters
Kernel TLV
 
BPF Hardware Offload Deep Dive
BPF Hardware Offload Deep DiveBPF Hardware Offload Deep Dive
BPF Hardware Offload Deep Dive
Netronome
 
USENIX Vault'19: Performance analysis in Linux storage stack with BPF
USENIX Vault'19: Performance analysis in Linux storage stack with BPFUSENIX Vault'19: Performance analysis in Linux storage stack with BPF
USENIX Vault'19: Performance analysis in Linux storage stack with BPF
Taeung Song
 
Meetup 2009
Meetup 2009Meetup 2009
Meetup 2009
HuaiEnTseng
 
Debugging Python with gdb
Debugging Python with gdbDebugging Python with gdb
Debugging Python with gdb
Roman Podoliaka
 
Not breaking userspace: the evolving Linux ABI
Not breaking userspace: the evolving Linux ABINot breaking userspace: the evolving Linux ABI
Not breaking userspace: the evolving Linux ABI
Alison Chaiken
 
Software Quality Assurance Tooling 2023
Software Quality Assurance Tooling 2023Software Quality Assurance Tooling 2023
Software Quality Assurance Tooling 2023
Henry Schreiner
 
Introduction of eBPF - 時下最夯的Linux Technology
Introduction of eBPF - 時下最夯的Linux Technology Introduction of eBPF - 時下最夯的Linux Technology
Introduction of eBPF - 時下最夯的Linux Technology
Jace Liang
 
An Overview of the IHK/McKernel Multi-kernel Operating System
An Overview of the IHK/McKernel Multi-kernel Operating SystemAn Overview of the IHK/McKernel Multi-kernel Operating System
An Overview of the IHK/McKernel Multi-kernel Operating System
Linaro
 
Android memory analysis Debug slides.pdf
Android memory analysis Debug slides.pdfAndroid memory analysis Debug slides.pdf
Android memory analysis Debug slides.pdf
VishalKumarJha10
 
Investigation report on 64 bit support and some of new features in aosp master
Investigation report on 64 bit support and some of new features in aosp masterInvestigation report on 64 bit support and some of new features in aosp master
Investigation report on 64 bit support and some of new features in aosp master
hidenorly
 

Similar to BPF / XDP 8월 세미나 KossLab (20)

Efficient System Monitoring in Cloud Native Environments
Efficient System Monitoring in Cloud Native EnvironmentsEfficient System Monitoring in Cloud Native Environments
Efficient System Monitoring in Cloud Native Environments
 
eBPF Tooling and Debugging Infrastructure
eBPF Tooling and Debugging InfrastructureeBPF Tooling and Debugging Infrastructure
eBPF Tooling and Debugging Infrastructure
 
ebpf and IO Visor: The What, how, and what next!
ebpf and IO Visor: The What, how, and what next!ebpf and IO Visor: The What, how, and what next!
ebpf and IO Visor: The What, how, and what next!
 
eBPF Debugging Infrastructure - Current Techniques
eBPF Debugging Infrastructure - Current TechniqueseBPF Debugging Infrastructure - Current Techniques
eBPF Debugging Infrastructure - Current Techniques
 
DCSF 19 eBPF Superpowers
DCSF 19 eBPF SuperpowersDCSF 19 eBPF Superpowers
DCSF 19 eBPF Superpowers
 
DISTRIBUTED PERFORMANCE ANALYSIS USING INFLUXDB AND THE LINUX EBPF VIRTUAL MA...
DISTRIBUTED PERFORMANCE ANALYSIS USING INFLUXDB AND THE LINUX EBPF VIRTUAL MA...DISTRIBUTED PERFORMANCE ANALYSIS USING INFLUXDB AND THE LINUX EBPF VIRTUAL MA...
DISTRIBUTED PERFORMANCE ANALYSIS USING INFLUXDB AND THE LINUX EBPF VIRTUAL MA...
 
Beagleboard xm-setup
Beagleboard xm-setupBeagleboard xm-setup
Beagleboard xm-setup
 
Systems@Scale 2021 BPF Performance Getting Started
Systems@Scale 2021 BPF Performance Getting StartedSystems@Scale 2021 BPF Performance Getting Started
Systems@Scale 2021 BPF Performance Getting Started
 
Berkeley Packet Filters
Berkeley Packet FiltersBerkeley Packet Filters
Berkeley Packet Filters
 
BPF Hardware Offload Deep Dive
BPF Hardware Offload Deep DiveBPF Hardware Offload Deep Dive
BPF Hardware Offload Deep Dive
 
USENIX Vault'19: Performance analysis in Linux storage stack with BPF
USENIX Vault'19: Performance analysis in Linux storage stack with BPFUSENIX Vault'19: Performance analysis in Linux storage stack with BPF
USENIX Vault'19: Performance analysis in Linux storage stack with BPF
 
Meetup 2009
Meetup 2009Meetup 2009
Meetup 2009
 
PHP selber bauen
PHP selber bauenPHP selber bauen
PHP selber bauen
 
Debugging Python with gdb
Debugging Python with gdbDebugging Python with gdb
Debugging Python with gdb
 
Not breaking userspace: the evolving Linux ABI
Not breaking userspace: the evolving Linux ABINot breaking userspace: the evolving Linux ABI
Not breaking userspace: the evolving Linux ABI
 
Software Quality Assurance Tooling 2023
Software Quality Assurance Tooling 2023Software Quality Assurance Tooling 2023
Software Quality Assurance Tooling 2023
 
Introduction of eBPF - 時下最夯的Linux Technology
Introduction of eBPF - 時下最夯的Linux Technology Introduction of eBPF - 時下最夯的Linux Technology
Introduction of eBPF - 時下最夯的Linux Technology
 
An Overview of the IHK/McKernel Multi-kernel Operating System
An Overview of the IHK/McKernel Multi-kernel Operating SystemAn Overview of the IHK/McKernel Multi-kernel Operating System
An Overview of the IHK/McKernel Multi-kernel Operating System
 
Android memory analysis Debug slides.pdf
Android memory analysis Debug slides.pdfAndroid memory analysis Debug slides.pdf
Android memory analysis Debug slides.pdf
 
Investigation report on 64 bit support and some of new features in aosp master
Investigation report on 64 bit support and some of new features in aosp masterInvestigation report on 64 bit support and some of new features in aosp master
Investigation report on 64 bit support and some of new features in aosp master
 

Recently uploaded

Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Inflectra
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Product School
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
Product School
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
DianaGray10
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
Frank van Harmelen
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
Product School
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 

Recently uploaded (20)

Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 

BPF / XDP 8월 세미나 KossLab

  • 1. Linux Kernel - BPF / XDP KossLab 유태희, 송태웅
  • 2. BPF 란 ? 1. 1. Berkeley Packet Filter since 1992 2. 2. Kernel Infrastructure
  • 3. BPF 란 ? 1. Berkeley Packet Filter since 1992 1. 2. Kernel Infrastructure a. - Interpreter in-kernel virtual machine - Hook points in-kernel callback point - Map - Helper
  • 4. BPF 란 ? “Safe dynamic programs and tools” "런타임중 안전하게 커널코드를 삽입하는 기술"
  • 5. BPF Infrastructure: 안전한 code injection 작전 1) Native 머신코드 대신 BPF instruction 을 활용하자 2) Verifier 를 통해 위험요소를 미리검사하자 3) (기존)커널함수가 필요할때 Helper 함수를 통해서만 호출하자
  • 6. BPF Infrastructure: 안전한 code injection 작전 1) Native 머신코드 대신 BPF instruction 을 활용하자
  • 7. BPF Infrastructure: 안전한 code injection 작전 2) Verifier 를 통해 위험요소를 미리검사하자
  • 8. BPF Infrastructure: 안전한 code injection 작전 3) (기존)커널함수가 필요할때 Helper 함수를 통해서만 호출하자
  • 9. BPF Infrastructure: 안전한 code injection 위한 기반기술 Kernel += BPF Interpreter in-kernel virtual machine + Verifier + BPF Helper 함수 추가 leveraging kernel func + BPF syscall prog/map: loading & attaching 등
  • 10. 1) 주니어 x86 Instruction set ’simplified x86’ (참고: PLUMgrind의 x86 bytecode verifier 실패) 2) BPF = classic BPF:10% + x86:70% + arm64:25% + risc:5% 3) Instruction encoding 사이즈 고정 (for high interpreter speed) 4) 간소화 -> 위험을 예측하고 예방하기 수월 (Verifier를 통한 loop, memory access 범위 점검 등) 5) Architecture-independent BPF Instruction set:
  • 11. BPF Instruction set: immediate:32 offset:16 src:4 dst:4 opcode:8 $ cat include/uapi/linux/bpf.h [...] struct bpf_insn { __u8 code; /* opcode */ __u8 dst_reg:4; /* dest register */ __u8 src_reg:4; /* source register */ __s16 off; /* signed offset */ __s32 imm; /* signed immediate constant */ }; [...]
  • 12. BPF Instruction set: immediate:32 offset:16 src:4 dst:4 opcode:8 class:4 + fields:4 + fields:4 eBPF: include/uapi/linux/bpf.h cBPF: include/uapi/linux/bpf_common.h
  • 13. BPF Instruction set: immediate:32 offset:16 src:4 dst:4 opcode:8 class:4 + LD/ST fields:4 + ALU/JUM fields:4 eBPF: include/uapi/linux/bpf.h cBPF: include/uapi/linux/bpf_common.h LD/ST 계열: 0x00 ~ 0x03 ALU/JMP 계열: 0x04 ~ 0x07
  • 14. BPF Instruction set: immediate:32 offset:16 src:4 dst:4 opcode:8 class:4 + LD/ST fields:4 + ALU/JUM fields:4 eBPF: include/uapi/linux/bpf.h cBPF: include/uapi/linux/bpf_common.h LD/ST 계열: 0x00 ~ 0x03 ALU/JMP 계열: 0x04 ~ 0x07
  • 15. BPF Instruction set: struct bpf_insn prog[] = { BPF_MOV64_REG(BPF_REG_6, BPF_REG_1), BPF_LD_ABS(BPF_B, ETH_HLEN + offsetof(struct iphdr, protocol) /* R0 = ip->proto */), BPF_STX_MEM(BPF_W, BPF_REG_10, BPF_REG_0, -4), /* *(u32 *)(fp - 4) = r0 */ BPF_MOV64_REG(BPF_REG_2, BPF_REG_10), BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -4), /* r2 = fp - 4 */ BPF_LD_MAP_FD(BPF_REG_1, map_fd), BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem), BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 2), BPF_MOV64_IMM(BPF_REG_1, 1), /* r1 = 1 */ BPF_RAW_INSN(BPF_STX | BPF_XADD | BPF_DW, BPF_REG_0, BPF_REG_1, 0, 0), /* xadd r0 += r1 */ BPF_MOV64_IMM(BPF_REG_0, 0), /* r0 = 0 */ BPF_EXIT_INSN(), }; https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git/tree/samples/bpf/sock_example.c
  • 16. BPF Helper 함수: $ grep BPF_CALL kernel/bpf/helpers.c: BPF_CALL_2(bpf_map_lookup_elem, struct bpf_map *, map, void *, key) BPF_CALL_4(bpf_map_update_elem, struct bpf_map *, map, void *, key, [...] kernel/trace/bpf_trace.c: BPF_CALL_2(bpf_override_return, struct pt_regs *, regs, unsigned long, rc) BPF_CALL_3(bpf_probe_read, void *, dst, u32, size, const void *, unsafe_ptr) BPF_CALL_3(bpf_probe_write_user, void *, unsafe_ptr, const void *, src, BPF_CALL_5(bpf_trace_printk, char *, fmt, u32, fmt_size, u64, arg1, [...] net/core/filter.c: BPF_CALL_1(bpf_skb_get_pay_offset, struct sk_buff *, skb) BPF_CALL_3(bpf_skb_get_nlattr, struct sk_buff *, skb, u32, a, u32, x) [...]
  • 17. BPF as a kernel subproject “Safe dynamic programs and tools” $ cat MAINTAINERS | grep -A 3 BPF BPF (Safe dynamic programs and tools) M: Alexei Starovoitov <ast@kernel.org> M: Daniel Borkmann <daniel@iogearbox.net> L: netdev@vger.kernel.org [...]
  • 18. “Safe dynamic programs and tools” $ cat MAINTAINERS | grep -A 27 BPF BPF (Safe dynamic programs and tools) [...] F: arch/x86/net/bpf_jit* [...] F: kernel/bpf/ F: kernel/trace/bpf_trace.c [...] F: net/core/filter.c F: net/sched/act_bpf.c F: net/sched/cls_bpf.c [...] [...] F: samples/bpf/ F: tools/bpf/ F: tools/lib/bpf/ F: tools/testing/selftests/bpf/ BPF as a kernel subproject
  • 19. $ cat MAINTAINERS | grep -A 27 BPF BPF (Safe dynamic programs and tools) [...] F: arch/x86/net/bpf_jit* [...] F: kernel/bpf/ F: kernel/trace/bpf_trace.c [...] F: net/core/filter.c F: net/sched/act_bpf.c F: net/sched/cls_bpf.c [...] [...] F: samples/bpf/ F: tools/bpf/ F: tools/lib/bpf/ F: tools/testing/selftests/bpf/ JIT 지원 arch: x86, arm, arm64 sparc, s390, powerpc, mips “Safe dynamic programs and tools” BPF as a kernel subproject
  • 20. “Safe dynamic programs and tools” $ cat MAINTAINERS | grep -A 27 BPF BPF (Safe dynamic programs and tools) [...] F: arch/x86/net/bpf_jit* [...] F: kernel/bpf/ F: kernel/trace/bpf_trace.c [...] F: net/core/filter.c F: net/sched/act_bpf.c F: net/sched/cls_bpf.c [...] [...] F: samples/bpf/ F: tools/bpf/ F: tools/lib/bpf/ F: tools/testing/selftests/bpf/ BPF core: Syscall, Interpreter, Verifier, Generic Helpers, Maps, ... BPF as a kernel subproject
  • 21. “Safe dynamic programs and tools” $ cat MAINTAINERS | grep -A 27 BPF BPF (Safe dynamic programs and tools) [...] F: arch/x86/net/bpf_jit* [...] F: kernel/bpf/ F: kernel/trace/bpf_trace.c [...] F: net/core/filter.c F: net/sched/act_bpf.c F: net/sched/cls_bpf.c [...] [...] F: samples/bpf/ F: tools/bpf/ F: tools/lib/bpf/ F: tools/testing/selftests/bpf/ Hook points, Specific Helpers ... For cBPF, ... BPF as a kernel subproject
  • 22. “Safe dynamic programs and tools” $ cat MAINTAINERS | grep -A 27 BPF BPF (Safe dynamic programs and tools) [...] F: arch/x86/net/bpf_jit* [...] F: kernel/bpf/ F: kernel/trace/bpf_trace.c [...] F: net/core/filter.c F: net/sched/act_bpf.c F: net/sched/cls_bpf.c [...] [...] F: samples/bpf/ F: tools/bpf/ F: tools/lib/bpf/ F: tools/testing/selftests/bpf/ bpf loading(lib), bpf tool, test codes, samples, ... BPF as a kernel subproject
  • 23. BPF Infrastructure: BPF프로그램 활용을 위한 지원 1) Hook points in-kernel callback point 2) Map user-to-kernel shared memory 3) helper를 통한 커널함수호출 leveraging 4) Object pinning /sys/fs/bpf/...
  • 24. KERNEL SPACE bpf() SYSCALL BPF Controller 1 (User App) ip tc Map 1 (Shared memory) Map 2 (Shared memory) . . BPF BPF BPF func(): Helper func() func() func() BPF library in-iproute2 BPF Controller 2 (User App) . . . . . . BPF Architecture: BPF library: libbpf prog/map load, attach, control
  • 25. XDP
  • 30. XDP == FAST PATH
  • 31. NORMAL PATH TX APP L7 RX L3 input TC Ingress PREROUTING ROUTING TCP/UDP FORWARD ROUTING INPUT OUTPUT POSTROUTING TC egress L3 output L4 L3 L3 DD
  • 32. NORMAL PATH TX APP L7 RX L3 input TC Ingress PREROUTING ROUTING TCP/UDP FORWARD ROUTING INPUT OUTPUT POSTROUTING TC egress L3 output L4 L3 L3 DD
  • 35. 준비물 1. 컴파일 컴퓨터 1대 2. 테스트 컴퓨터 1대(x86추천) 3. 커널 소스코드 4. clang + llvm(컴파일러) 5. bpftool(bpf 프로그램 로더) 6. bpf를 지원하는 iproute2 패키지
  • 37. git.kernel.org 의 bpf tree https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git 커널 소스코드
  • 40. kernel source code 및 bpf sample code samples/bpf 예제
  • 41. kernel소스 내 sample code 분석 samples/bpf 예제(xdp_rxq_info_kern.c)
  • 42. BPF 프로그램 컴파일 실습 samples/bpf 컴파일
  • 43. $ mount bpffs /sys/fs/bpf -t bpf $ bpftool prog load ./xdp_rxq_info_kern.o /sys/fs/bpf/xdp 프로그램 로드
  • 44. $ ls /sys/fs/bpf/ $ ./bpftool prog list $ ./bpftool prog dump xlated id X jited 프로그램 확인
  • 45. $ ip link set dev lo xdp pin /sys/fs/bpf/xdp XDP프로그램 설정
  • 46. $ ip link show dev lo XDP프로그램 설정 확인
  • 47. $ ip link set dev lo xdp off $ rm /sys/fs/bpf/xdp XDP프로그램 설정 제거
  • 51. #PC2 $ ping 192.168.4.1 #PC1 $ iptables -A INPUT -s 192.168.4.2 -d 192.168.4.1 -p icmp -j DROP
  • 52. NORMAL PATH TX APP L7 RX L3 input TC Ingress PREROUTING ROUTING TCP/UDP FORWARD ROUTING INPUT OUTPUT POSTROUTING TC egress L3 output L4 L3 L3 DD
  • 53. NORMAL PATH TX APP L7 RX L3 input TC Ingress PREROUTING ROUTING TCP/UDP FORWARD ROUTING INPUT OUTPUT POSTROUTING TC egress L3 output L4 L3 L3 DD DROP
  • 55. $ ./bpftool prog load ./xdp_icmp.o /sys/fs/bpf/xdp_icmp $ ip link set dev lo xdp pin /sys/fs/bpf/xdp_icmp XDP프로그램 설정 제거
  • 56. XDP GENERIC PATH TX APP L7 RX BPF TC Ingress PREROUTING ROUTING TCP/UDP FORWARD ROUTING INPUT OUTPUT POSTROUTING TC egress L3 output L4 L3 L3 DD DROP
  • 59. netif_receive_skb_internal() ipt_do_table() Long time !! ~~ DROP BPF Tracing: iptables - DROP case
  • 61. netif_receive_skb_internal() do_xdp_generic() Short time !! ~~ DROP BPF Tracing: XDP - DROP case
  • 62. netif_receive_skb_internal() ipt_do_table() do_xdp_generic() Short time !! ~~ BPF Tracing: iptables vs XDP - DROP case DROP DROP Long time !! ~~
  • 63. BPF Tracing: iptables vs XDP - DROP case net/core/dev.c: static int netif_receive_skb_internal(struct sk_buff *skb) net/core/dev.c: int do_xdp_generic(struct bpf_prog *xdp_prog, struct sk_buff *skb) net/ipv4/netfilter/ip_tables.c: unsigned int ipt_do_table(struct sk_buff *skb, ...) DROP DROP
  • 64. net/core/dev.c: static int netif_receive_skb_internal(struct sk_buff *skb) net/core/dev.c: int do_xdp_generic(struct bpf_prog *xdp_prog, struct sk_buff *skb) net/ipv4/netfilter/ip_tables.c: unsigned int ipt_do_table(struct sk_buff *skb, ...) BPF Tracing: iptables vs XDP - DROP case
  • 65. net/core/dev.c: static int netif_receive_skb_internal(struct sk_buff *skb) net/core/dev.c: int do_xdp_generic(struct bpf_prog *xdp_prog, struct sk_buff *skb) net/ipv4/netfilter/ip_tables.c: unsigned int ipt_do_table(struct sk_buff *skb, ...) BPF Tracing: iptables vs XDP - DROP case BPF BPF Beginning point: BPF ATTACH !! BPF Return point: BPF ATTACH !! Return point: BPF ATTACH !!
  • 66. net/core/dev.c: static int netif_receive_skb_internal(struct sk_buff *skb) net/core/dev.c: int do_xdp_generic(struct bpf_prog *xdp_prog, struct sk_buff *skb) net/ipv4/netfilter/ip_tables.c: unsigned int ipt_do_table(struct sk_buff *skb, ...) BPF Tracing: iptables vs XDP - DROP case BPF BPF BPFSEC("kprobe/netif_receive_skb_internal") int bpf_trace_receive_skb(struct pt_regs *ctx) { long skb_ptr = PT_REGS_PARM1(ctx); u64 start_time = bpf_ktime_get_ns(); bpf_map_update_elem(&tracing_map, &skb_ptr, &start_time, BPF_ANY); return 0; }
  • 67. BPF BPF net/core/dev.c: static int netif_receive_skb_internal(struct sk_buff *skb) net/core/dev.c: int do_xdp_generic(struct bpf_prog *xdp_prog, struct sk_buff *skb) net/ipv4/netfilter/ip_tables.c: unsigned int ipt_do_table(struct sk_buff *skb, ...) BPF Tracing: iptables vs XDP - DROP case BPF SEC("kprobe/netif_receive_skb_internal") int bpf_trace_receive_skb(struct pt_regs *ctx) { long skb_ptr = PT_REGS_PARM1(ctx); u64 start_time = bpf_ktime_get_ns(); bpf_map_update_elem(&tracing_map, &skb_ptr, &start_time, BPF_ANY); return 0; }
  • 68. BPF BPF net/core/dev.c: static int netif_receive_skb_internal(struct sk_buff *skb) net/core/dev.c: int do_xdp_generic(struct bpf_prog *xdp_prog, struct sk_buff *skb) net/ipv4/netfilter/ip_tables.c: unsigned int ipt_do_table(struct sk_buff *skb, ...) BPF Tracing: iptables vs XDP - DROP case BPF SEC("kretprobe/do_xdp_generic") int bpf_trace_xdp_drop(struct pt_regs *ctx) { long skb_ptr = PT_REGS_PARM2(ctx); int action = PT_REGS_RC(ctx); if (action == XDP_DROP) { u64 *time = bpf_map_lookup_elem(&tracing_map, &skb_ptr); u64 cur_time = bpf_ktime_get_ns(); u64 delta = cur_time - tr->time; *time = delta; ...
  • 69. BPF BPF net/core/dev.c: static int netif_receive_skb_internal(struct sk_buff *skb) net/core/dev.c: int do_xdp_generic(struct bpf_prog *xdp_prog, struct sk_buff *skb) net/ipv4/netfilter/ip_tables.c: unsigned int ipt_do_table(struct sk_buff *skb, ...) BPF Tracing: iptables vs XDP - DROP case BPF SEC("kretprobe/do_xdp_generic") int bpf_trace_xdp_drop(struct pt_regs *ctx) { long skb_ptr = PT_REGS_PARM2(ctx); int action = PT_REGS_RC(ctx); if (action == XDP_DROP) { u64 *time = bpf_map_lookup_elem(&tracing_map, &skb_ptr); u64 cur_time = bpf_ktime_get_ns(); u64 delta = cur_time - tr->time; *time = delta; ...
  • 70. BPF BPF net/core/dev.c: static int netif_receive_skb_internal(struct sk_buff *skb) net/core/dev.c: int do_xdp_generic(struct bpf_prog *xdp_prog, struct sk_buff *skb) net/ipv4/netfilter/ip_tables.c: unsigned int ipt_do_table(struct sk_buff *skb, ...) BPF Tracing: iptables vs XDP - DROP case BPF SEC("kretprobe/ipt_do_table") int bpf_trace_iptables_drop(struct pt_regs *ctx) { long skb_ptr = PT_REGS_PARM1(ctx); int action = PT_REGS_RC(ctx); if (action == NF_DROP) { u64 *time = bpf_map_lookup_elem(&tracing_map, &skb_ptr); u64 cur_time = bpf_ktime_get_ns(); u64 delta = cur_time - tr->time; *time = delta; ...
  • 71. BPF BPF net/core/dev.c: static int netif_receive_skb_internal(struct sk_buff *skb) net/core/dev.c: int do_xdp_generic(struct bpf_prog *xdp_prog, struct sk_buff *skb) net/ipv4/netfilter/ip_tables.c: unsigned int ipt_do_table(struct sk_buff *skb, ...) BPF Tracing: iptables vs XDP - DROP case BPF SEC("kretprobe/ipt_do_table") int bpf_trace_iptables_drop(struct pt_regs *ctx) { long skb_ptr = PT_REGS_PARM1(ctx); int action = PT_REGS_RC(ctx); if (action == NF_DROP) { u64 *time = bpf_map_lookup_elem(&tracing_map, &skb_ptr); u64 cur_time = bpf_ktime_get_ns(); u64 delta = cur_time - tr->time; *time = delta; ...
  • 73. $ cat /sys/kernel/debug/tracing/trace netif_receive_skb_internal() { ktime_get_with_offset(); __netif_receive_skb() { __netif_receive_skb_core() { ip_rcv() { pskb_trim_rcsum_slow(); nf_hook_slow() { iptable_mangle_hook() { ipt_do_table() { __local_bh_enable_ip(); } } } ip_rcv_finish() { udp_v4_early_demux(); ip_route_input_noref() { ip_route_input_rcu() { ip_route_input_slow() { fib_table_lookup(); fib_validate_source() { __fib_validate_source() { fib_table_lookup(); } } } } } ip_local_deliver() { nf_hook_slow() { iptable_mangle_hook() { ipt_do_table() { __local_bh_enable_ip(); } } iptable_filter_hook() { ipt_do_table() { udp_mt(); __local_bh_enable_ip(); } } kfree_skb()
  • 74. $ cat /sys/kernel/debug/tracing/trace netif_receive_skb_internal() { ktime_get_with_offset(); __netif_receive_skb() { __netif_receive_skb_core() { ip_rcv() { pskb_trim_rcsum_slow(); nf_hook_slow() { iptable_mangle_hook() { ipt_do_table() { __local_bh_enable_ip(); } } } ip_rcv_finish() { udp_v4_early_demux(); ip_route_input_noref() { ip_route_input_rcu() { ip_route_input_slow() { fib_table_lookup(); fib_validate_source() { __fib_validate_source() { fib_table_lookup(); } } } } } ip_local_deliver() { nf_hook_slow() { iptable_mangle_hook() { ipt_do_table() { __local_bh_enable_ip(); } } iptable_filter_hook() { ipt_do_table() { udp_mt(); __local_bh_enable_ip(); } } kfree_skb() DROP
  • 75. $ cat /sys/kernel/debug/tracing/trace netif_receive_skb_internal() { ktime_get_with_offset(); __netif_receive_skb() { __netif_receive_skb_core() { ip_rcv() { pskb_trim_rcsum_slow(); nf_hook_slow() { iptable_mangle_hook() { ipt_do_table() { __local_bh_enable_ip(); } } } ip_rcv_finish() { udp_v4_early_demux(); ip_route_input_noref() { ip_route_input_rcu() { ip_route_input_slow() { fib_table_lookup(); fib_validate_source() { __fib_validate_source() { fib_table_lookup(); } } } } } ip_local_deliver() { nf_hook_slow() { iptable_mangle_hook() { ipt_do_table() { __local_bh_enable_ip(); } } iptable_filter_hook() { ipt_do_table() { udp_mt(); __local_bh_enable_ip(); } } kfree_skb() netif_receive_skb_internal() { ktime_get_with_offset(); do_xdp_generic() { pskb_expand_head() { __kmalloc_reserve.isra.48() { __kmalloc_node_track_caller() { kmalloc_slab(); should_failslab(); } } ksize(); skb_free_head() { page_frag_free(); } skb_headers_offset_update(); } __bpf_prog_run32() { ___bpf_prog_run(); } kfree_skb() DROP DROP
  • 76. $ cat /sys/kernel/debug/tracing/trace netif_receive_skb_internal() { ktime_get_with_offset(); __netif_receive_skb() { __netif_receive_skb_core() { ip_rcv() { pskb_trim_rcsum_slow(); nf_hook_slow() { iptable_mangle_hook() { ipt_do_table() { __local_bh_enable_ip(); } } } ip_rcv_finish() { udp_v4_early_demux(); ip_route_input_noref() { ip_route_input_rcu() { ip_route_input_slow() { fib_table_lookup(); fib_validate_source() { __fib_validate_source() { fib_table_lookup(); } } } } } ip_local_deliver() { nf_hook_slow() { iptable_mangle_hook() { ipt_do_table() { __local_bh_enable_ip(); } } iptable_filter_hook() { ipt_do_table() { udp_mt(); __local_bh_enable_ip(); } } kfree_skb() netif_receive_skb_internal() { ktime_get_with_offset(); do_xdp_generic() { pskb_expand_head() { __kmalloc_reserve.isra.48() { __kmalloc_node_track_caller() { kmalloc_slab(); should_failslab(); } } ksize(); skb_free_head() { page_frag_free(); } skb_headers_offset_update(); } __bpf_prog_run32() { ___bpf_prog_run(); } kfree_skb() DROP DROP
  • 77. $ cat /sys/kernel/debug/tracing/trace netif_receive_skb_internal() { ktime_get_with_offset(); __netif_receive_skb() { __netif_receive_skb_core() { ip_rcv() { pskb_trim_rcsum_slow(); nf_hook_slow() { iptable_mangle_hook() { ipt_do_table() { __local_bh_enable_ip(); } } } ip_rcv_finish() { udp_v4_early_demux(); ip_route_input_noref() { ip_route_input_rcu() { ip_route_input_slow() { fib_table_lookup(); fib_validate_source() { __fib_validate_source() { fib_table_lookup(); } } } } } ip_local_deliver() { nf_hook_slow() { iptable_mangle_hook() { ipt_do_table() { __local_bh_enable_ip(); } } iptable_filter_hook() { ipt_do_table() { udp_mt(); __local_bh_enable_ip(); } } kfree_skb() netif_receive_skb_internal() { ktime_get_with_offset(); do_xdp_generic() { pskb_expand_head() { __kmalloc_reserve.isra.48() { __kmalloc_node_track_caller() { kmalloc_slab(); should_failslab(); } } ksize(); skb_free_head() { page_frag_free(); } skb_headers_offset_update(); } __bpf_prog_run32() { ___bpf_prog_run(); } kfree_skb() DROP DROP YOU WIN !! “XDP is LOVE”
  • 79. BPF Infrastructure: 1) Hook points in-kernel callback point 2) LOAD ATTACH CALLBACK 3) Verifier / Interpreter / JIT 4) Map user-to-kernel shared memory 5) helper를 통한 커널함수호출 leveraging 6) Object pinning /sys/fs/bpf/… ...
  • 80. Hook points: callback points KERNEL SPACE XDP: L2 device driver 지점 tc: L3 DD 직전 / 직후 지점 kprobe: 함수 Entry / Return . . . . . .
  • 81. Hook points: callback points KERNEL SPACE XDP: L2 device driver 지점 tc: L3 DD 직전 / 직후 지점 kprobe: 함수 Entry / Return . . . . . .if (has_bpf_prog) BPF_PROG_RUN(); ->bpf_func(ctx, insni); 특정 커널 함수 안에
  • 82. Hook points: callback points KERNEL SPACE XDP: L2 device driver 지점 kprobe: 함수 Entry / Return . . . . . . BPF BPF BPF BPF prog injection !! tc: L3 DD 직전 / 직후 지점if (has_bpf_prog) BPF_PROG_RUN(); ->bpf_func(ctx, insni); 특정 커널 함수 안에
  • 83. Hook points: callback points KERNEL SPACE XDP: L2 device driver 지점 kprobe: 함수 Entry / Return . . . . . . BPF BPF BPF BPF prog injection !! tc: L3 DD 직전 / 직후 지점if (has_bpf_prog) BPF_PROG_RUN(); ->bpf_func(ctx, insni); BPF Interpreter 또는 JIT 된 머신코드 특정 커널 함수 안에
  • 84. Hook points: callback points KERNEL SPACE XDP: L2 device driver 지점 kprobe: 함수 Entry / Return . . . . . . BPF BPF BPF BPF prog injection !! HOW ? tc: L3 DD 직전 / 직후 지점
  • 85. KERNEL SPACE tc ip BPF library in-iproute2 bpf() SYSCALL c소스 _kern.c clang / llc 컴파일 BPF 프로그램 Or BPF bytecode BPF elf
  • 86. KERNEL SPACE tc ip BPF library in-iproute2 bpf() SYSCALL c소스 _kern.c clang / llc 컴파일 BPF 프로그램 Or BPF bytecode BPF elf 1. ELF parsing, 2. 1차 Relocation: 1) map fd 2) bpf to bpf call
  • 87. KERNEL SPACE tc ip BPF library in-iproute2 bpf() SYSCALL c소스 _kern.c clang / llc 컴파일 BPF 프로그램 Or BPF bytecode BPF elf 1. ELF parsing, 2. 1차 Relocation: 1) map fd 2) bpf to bpf call Map 1 (Shared memory)
  • 88. KERNEL SPACE tc ip BPF library in-iproute2 bpf() SYSCALL c소스 _kern.c clang / llc 컴파일 BPF 프로그램 Or BPF bytecode BPF elf 1. ELF parsing, 2. 1차 Relocation: 1) map fd 2) bpf to bpf call 3. Loading BPF BPF BPF prog injection !! BPF_PROG_LOAD Map 1 (Shared memory)
  • 89. KERNEL SPACE tc ip BPF library in-iproute2 bpf() SYSCALL c소스 _kern.c clang / llc 컴파일 BPF 프로그램 Or BPF bytecode BPF elf 1. ELF parsing, 2. 1차 Relocation: 1) map fd 2) bpf to bpf call 3. Loading BPF BPF BPF prog injection !! BPF Controller (User App) BPF library: libbpf prog/map load, attach, control . . . Map 1 (Shared memory)
  • 90. KERNEL SPACE tc ip BPF library in-iproute2 bpf() SYSCALL c소스 _kern.c clang / llc 컴파일 BPF 프로그램 Or BPF bytecode BPF elf 1. ELF parsing, 2. 1차 Relocation: 1) map fd 2) bpf to bpf call 3. Loading BPF BPF BPF prog injection !! BPF Controller (User App) BPF library: libbpf prog/map load, attach, control . . . HOW ? in bpf() Map 1 (Shared memory)
  • 91. KERNEL SPACE tc ip BPF library in-iproute2 bpf() SYSCALL c소스 _kern.c clang / llc 컴파일 BPF 프로그램 Or BPF bytecode BPF elf 1. ELF parsing, 2. 1차 Relocation: 1) map fd 2) bpf to bpf call 3. Loading BPF Controller (User App) BPF library: libbpf prog/map load, attach, control . . . BPF LOAD 과정: 1. BPF prog / map alloc 2. Verifier (loop, mem access 범위) Map 1 (Shared memory)
  • 92. KERNEL SPACE tc ip BPF library in-iproute2 bpf() SYSCALL c소스 _kern.c clang / llc 컴파일 BPF 프로그램 Or BPF bytecode BPF elf 1. ELF parsing, 2. 1차 Relocation: 1) map fd 2) bpf to bpf call 3. Loading BPF Controller (User App) BPF library: libbpf prog/map load, attach, control . . . BPF LOAD 과정: 1. BPF prog / map alloc 2. Verifier (loop, mem access 범위) 3. 2차 Relocation: 1) map fd → map ptr 2) helper ID → func addr Map 1 (Shared memory)
  • 93. KERNEL SPACE tc ip BPF library in-iproute2 bpf() SYSCALL c소스 _kern.c clang / llc 컴파일 BPF 프로그램 Or BPF bytecode BPF elf 1. ELF parsing, 2. 1차 Relocation: 1) map fd 2) bpf to bpf call 3. Loading BPF Controller (User App) BPF library: libbpf prog/map load, attach, control . . . BPF LOAD 과정: 1. BPF prog / map alloc 2. Verifier (loop, mem access 범위) 3. 2차 Relocation: 1) map fd → map ptr 2) helper ID → func addr 4. select runtime: 1) BPF interpreter func addr 2) JIT 후 BPF func addr return fd; Map 1 (Shared memory)
  • 94. KERNEL SPACE tc ip BPF library in-iproute2 bpf() SYSCALL c소스 _kern.c clang / llc 컴파일 BPF 프로그램 Or BPF bytecode BPF elf 1. ELF parsing, 2. 1차 Relocation: 1) map fd 2) bpf to bpf call 3. Loading BPF Controller (User App) BPF library: libbpf prog/map load, attach, control . . . BPF LOAD 과정: 1. BPF prog / map alloc 2. Verifier (loop, mem access 범위) 3. 2차 Relocation: 1) map fd → map ptr 2) helper ID → func addr 4. select runtime: 1) BPF interpreter func addr 2) JIT 후 BPF func addr if (has_bpf_prog) BPF_PROG_RUN(); ->bpf_func(ctx, insni); return fd;
  • 95. KERNEL SPACE tc ip BPF library in-iproute2 bpf() SYSCALL c소스 _kern.c clang / llc 컴파일 BPF 프로그램 Or BPF bytecode BPF elf 1. ELF parsing, 2. 1차 Relocation: 1) map fd 2) bpf to bpf call 3. Loading BPF BPF BPF Controller (User App) BPF library: libbpf prog/map load, attach, control . . . Map 1 (Shared memory) Map 2 (Shared memory) . . . 다양한 BPF ATTACH 방식: - sock(), send() AF_NETLINK - bpf() syscall BPF_PROG_ATTACH BPF_RAW_TRACEPOINT_OPEN - kprobe event id, ioctl() PERF_EVENT_IOC_SET_BPF ...
  • 96. KERNEL SPACE tc ip BPF library in-iproute2 bpf() SYSCALL BPF BPF BPF Controller (User App) BPF library: libbpf prog/map load, attach, control . . . Map 1 (Shared memory) Map 2 (Shared memory) . . . BPF Callback !! Callback !! BPF CALLBACK !!
  • 97. KERNEL SPACE tc ip BPF library in-iproute2 bpf() SYSCALL BPF BPF BPF Controller (User App) BPF library: libbpf prog/map load, attach, control . . . Map 1 (Shared memory) Map 2 (Shared memory) . . . BPF func(): Helper func() func() func() BPF Helper 함수를 통한 커널함수 호출 leveraging !!
  • 98. KERNEL SPACE tc ip BPF library in-iproute2 bpf() SYSCALL BPF BPF . . . Map 1 (Shared memory) Map 2 (Shared memory) . . . BPF func(): Helper func() func() func() BPF Controller 1 (User App) BPF library: libbpf prog/map load, attach, control BPF Controller 2 (User App) BPF map 을 통한 user to kernel memory shared
  • 99. KERNEL SPACE bpf() SYSCALL BPF Controller 1 (User App) ip tc Map 1 (Shared memory) Map 2 (Shared memory) . . BPF BPF BPF func(): Helper func() func() func() BPF library in-iproute2 BPF Controller 2 (User App) . . . . . . BPF Architecture: BPF library: libbpf prog/map load, attach, control
  • 104. XDP GENERIC PATH TX APP L7 RX BPF TC Ingress PREROUTING ROUTING TCP/UDP FORWARD ROUTING INPUT OUTPUT POSTROUTING TC egress L3 output L4 L3 L3 DD
  • 105. XDP GENERIC PATH TX APP L7 RX BPF TC Ingress PREROUTING ROUTING TCP/UDP FORWARD ROUTING INPUT OUTPUT POSTROUTING TC egress L3 output L4 L3 L3 DD
  • 108. Driver XDP vs Generic XDP REDIRECT TX RX PASS BPF REDIRECT TX RX L3 BPF PASS
  • 110.
  • 116. XDP_REDIRECTBPF APP eth0 eth1 eth2 eth3 XDP_TX REDIRECT MAP
  • 117. XDP_REDIRECTBPF APP eth0 eth1 eth2 eth3 XDP_TX REDIRECT MAP
  • 118. XDP_REDIRECTBPF APP eth0 eth1 eth2 eth3 XDP_TX REDIRECT MAP
  • 120.
  • 122.
  • 123.
  • 124. XDP_REDIRECTBPF APP eth0 eth1 eth2 eth3 XDP_TX REDIRECT MAP
  • 125. XDP_REDIRECTBPF APP eth0 eth1 eth2 eth3 XDP_TX REDIRECT MAP
  • 128. DEVMAP
  • 129. DEVMAP REDIRECT TX RX BPF xdp_frame DEVMAP redirect info bpf_redirect_map Key Value(Device) 0 X 1 X 2 X xdp_frame xdp_frame xdp_frame xdp_frame xdp_frame xdp_frame xdp_frame xdp_frame xdp_frame xdp_frame xdp_frame xdp_frame xdp_frame xdp_frame xdp_frame
  • 130. CPUMAP
  • 131. CPUMAP REDIRECT ??? RX BPF xdp_frame CPUMAP redirect info bpf_redirect_map Key Value(CPU) 0 X 1 X 2 X xdp_frame xdp_frame xdp_frame xdp_frame xdp_frame xdp_frame xdp_frame
  • 132. CPUMAP REDIRECT netif_receive_skb_core RX BPF xdp_frame CPUMAP redirect info bpf_redirect_map Key Value(CPU) 0 X 1 X 2 X xdp_frame xdp_frame xdp_frame xdp_frame xdp_frame xdp_frame xdp_frame
  • 135. ● memory model switching ○ /net/core/xdp.c ● page pool ○ /net/core/page_pool ● offload ● AF_XDP && XSK(XDP SOCKET) ● helper functions ● Device Driver Additional Topics:
  • 136. ● Verifier ○ CFG, DAG, register, memory check... ● Other types ○ TC, SOCKET FILTER, CGROUP ● BTF ○ ELFutils, clang -g, llc -mattr=dwarfris ● Tail call ○ bpf_prog_array 연관 Additional Topics:
  • 137. ● FACEBOOK’s Katran ○ L4 Load-balancing ○ https://github.com/facebookincubator/katran ● Suricata ○ IPD/IDS engine ○ https://suricata-ids.org/ ● Cilium ○ https://cilium.io/ ● IOvisor bcc ○ https://www.iovisor.org/ ● IR Decoding ○ https://lwn.net/Articles/759188/ Additional Topics: