SlideShare a Scribd company logo
E cient System Monitoring in Cloud NativeE cient System Monitoring in Cloud Native
EnvironmentsEnvironments
gergely.szabo@origoss.com
About MyselfAbout Myself
more than 15 years in the industry
research, development, system architect, etc...
currently at Origoss Solutions
Cloud Native
Kubernetes
Prometheus
AgendaAgenda
BPF
Linux kernel tracing
EBPF
EBPF-based in monitoring in the cloud
BPFBPF
Packet Filtering ProblemPacket Filtering Problem
Filtering RequirementsFiltering Requirements
Ef cient
Flexible lter rules
Safe
BPFBPF
Steven McCanne and Van Jacobson:Steven McCanne and Van Jacobson:
The BSD Packet Filter: A New Architecture for User-levelThe BSD Packet Filter: A New Architecture for User-level
Packet Capture, 1992Packet Capture, 1992
http://www.tcpdump.org/papers/bpf-usenix93.pdf
(http://www.tcpdump.org/papers/bpf-usenix93.pdf)
BPF ArchitectureBPF Architecture
Capturing without FilteringCapturing without Filtering
In [ ]: %%bash
sudo tcpdump -nc 4
Simple Filtering RuleSimple Filtering Rule
In [ ]: %%bash
sudo tcpdump -nc 4 tcp and port 80
Complex RuleComplex Rule
To print all IPv4 HTTP packets to and from port 80, i.e. print only packets that contain
data, not, for example, SYN and FIN packets and ACK-only packets.
In [ ]: %%bash
sudo tcpdump -nc 4 'tcp port 80 and (((ip[2:2] - ((ip[0]&0xf)<<2)) - ((tcp[12]&0
xf0)>>2)) != 0)'
How Does This Work?How Does This Work?
BPF VM Instruction SetBPF VM Instruction Set
Simple Filtering RuleSimple Filtering Rule
In [ ]: %%bash
tcpdump -d tcp and port 80
Complex RuleComplex Rule
In [ ]: %%bash
tcpdump -d 'tcp port 80 and (((ip[2:2] - ((ip[0]&0xf)<<2)) - ((tcp[12]&0xf0)>>
2)) != 0)'
Linux Kernel TracepointsLinux Kernel Tracepoints
A tracepoint placed in code provides a hook to call a function (probe) that you
can provide at runtime.
A tracepoint can be "on" or "off"
When a tracepoint is "on", the function you provide is called each time
the tracepoint is executed
They can be used for tracing and performance accounting.
Adding TracepointsAdding Tracepoints
void blk_requeue_request(struct request_queue *q, struct request *rq)
{
blk_delete_timer(rq);
blk_clear_rq_complete(rq);
trace_block_rq_requeue(q, rq); // <- Tracepoint hook
if (rq->cmd_flags & REQ_QUEUED)
blk_queue_end_tag(q, rq);
BUG_ON(blk_queued_rq(rq));
elv_requeue_request(q, rq);
}
List of TracepointsList of Tracepoints
In [ ]: %%bash
perf list tracepoint
Tracepoints in ActionTracepoints in Action
In [ ]: %%bash
sudo perf stat -a -e kmem:kmalloc sleep 10
Linux Kernel KProbesLinux Kernel KProbes
dynamically break into any kernel routine and collect debugging and
performance information non-disruptively.
some parts of the kernel code can not be trapped
two types of probes: kprobes, and kretprobes
A kprobe can be inserted on virtually any instruction in the kernel.
A return probe res when a speci ed function returns.
List of KProbesList of KProbes
In [ ]: %%bash
sudo cat /sys/kernel/debug/kprobes/list
Probing a Linux FunctionProbing a Linux Function
In [ ]:
void blk_delete_timer(struct request *req)
{
list_del_init(&req->timeout_list);
}
%%bash
sudo sh -c 'echo p:demo_probe blk_delete_timer >> /sys/kernel/debug/tracing/kpro
be_events'
List of KProbesList of KProbes
In [ ]:
In [ ]:
%%bash
sudo cat /sys/kernel/debug/kprobes/list
%%bash
sudo perf list | grep demo
KProbes in ActionKProbes in Action
In [ ]: %%bash
sudo perf stat -a -e kprobes:demo_probe sleep 10
Removing KProbeRemoving KProbe
In [ ]:
In [ ]:
In [ ]:
%%bash
sudo sh -c 'echo "-:demo_probe" >> /sys/kernel/debug/tracing/kprobe_events'
%%bash
sudo cat /sys/kernel/debug/kprobes/list
%%bash
sudo perf list | grep demo
EBPFEBPF
Recent Developments: eBPFRecent Developments: eBPF
v3.15: BPF machine upgrade (64bit registers, more registers, new instruction)
v3.16: JIT compiling
v3.18: BPF maps
v4.1: attach BPF programs to kprobes
v4.7: attach BPF programs to tracepoints
v4.8:
...
XDP (https://www.iovisor.org/technology/xdp)
eBPF MapseBPF Maps
15+ map types: BPF_MAP_TYPE_HASH, BPF_MAP_TYPE_ARRAY,
BPF_MAP_TYPE_PROG_ARRAY, BPF_MAP_TYPE_PERF_EVENT_ARRAY, ...
associated to a userspace process
read/written by userspace process, eBPF programs
eBPF Map OperationseBPF Map Operations
int bpf_create_map(enum bpf_map_type map_type, unsigned int key_size, unsigned
int value_size, unsigned int max_entries)
int bpf_lookup_elem(int fd, const void *key, void *value)
int bpf_update_elem(int fd, const void *key, const void *value, uint64_t flags
)
int bpf_delete_elem(int fd, const void *key)
int bpf_get_next_key(int fd, const void *key, void *next_key)
eBPF ProgramseBPF Programs
20+ program types: BPF_PROG_TYPE_SOCKET_FILTER,
BPF_PROG_TYPE_KPROBE, BPF_PROG_TYPE_TRACEPOINT,
BPF_PROG_TYPE_XDP, ...
associated to a userspace process
event-based execution (e.g. tracepoint hooks)
executed by BPF VM
safe
ef cient
eBPF Program OperationseBPF Program Operations
int bpf_prog_load(enum bpf_prog_type type, const struct bpf_insn *insns, int i
nsn_cnt, const char *license)
eBPF Program as C structeBPF Program as C struct
struct bpf_insn prog[] = {
BPF_MOV64_REG(BPF_REG_6, BPF_REG_1), /* r6 = r1 */
BPF_LD_ABS(BPF_B, ETH_HLEN + offsetof(struct iphdr, protocol)),
/* r0 = ip->proto */
BPF_STX_MEM(BPF_W, BPF_REG_10, BPF_REG_0, -4),
/* *(u32 *)(fp - 4) = r0 */
BPF_MOV64_REG(BPF_REG_2, BPF_REG_10), /* r2 = fp */
BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -4), /* r2 = r2 - 4 */
BPF_LD_MAP_FD(BPF_REG_1, map_fd), /* r1 = map_fd */
BPF_CALL_FUNC(BPF_FUNC_map_lookup_elem),
/* r0 = map_lookup(r1, r2) */
BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 2),
/* if (r0 == 0) goto pc+2 */
BPF_MOV64_IMM(BPF_REG_1, 1), /* r1 = 1 */
BPF_XADD(BPF_DW, BPF_REG_0, BPF_REG_1, 0, 0),
/* lock *(u64 *) r0 += r1 */
BPF_MOV64_IMM(BPF_REG_0, 0), /* r0 = 0 */
BPF_EXIT_INSN(), /* return r0 */
};
eBPF Program as C CodeeBPF Program as C Code
Can be compiled with LLVM/Clang using the BPF backend.
int bpf_prog1(struct pt_regs *ctx)
{
/* attaches to kprobe netif_receive_skb,
* looks for packets on loobpack device and prints them
*/
char devname[IFNAMSIZ];
struct net_device *dev;
struct sk_buff *skb;
int len;
/* non-portable! works for the given kernel only */
skb = (struct sk_buff *) PT_REGS_PARM1(ctx);
dev = _(skb->dev);
len = _(skb->len);
bpf_probe_read(devname, sizeof(devname), dev->name);
if (devname[0] == 'l' && devname[1] == 'o') {
char fmt[] = "skb %p len %dn";
/* using bpf_trace_printk() for DEBUG ONLY */
bpf_trace_printk(fmt, sizeof(fmt), skb, len);
}
return 0;
}
eBPF-based MonitoringeBPF-based Monitoring
user
kernel
network
monitor
BPF map
BPF prog
BPF prog hook
hook
eBPF
eBPF Work ow: Linux Kernel BPF SampleseBPF Work ow: Linux Kernel BPF Samples
see linux/samples/bpf
eBPF kernel part (.c)
contains map and program de nitions
compiled with LLVM -> .o
eBPF user part (.c)
compiles to executable
extracts maps and programs from kernel part (.o)
creates maps: bpf_create_map
relocates maps in program codes
loads programs: bpf_prog_load
reads maps and generates output
eBPF Work ow:eBPF Work ow: iovisor/bcc
see
single Python script that contains:
de nition of eBPF maps
de nition of eBPF programs (as LLVM compatible C code)
code to read and process the maps
C code is compiled when the script starts (LLVM)
https://github.com/iovisor/bcc (https://github.com/iovisor/bcc)
eBPF ExampleeBPF Example
https://github.com/iovisor/bcc/blob/master/tools/ lelife.py
(https://github.com/iovisor/bcc/blob/master/tools/ lelife.py)
#!/usr/bin/python
# @lint-avoid-python-3-compatibility-imports
#
# filelife Trace the lifespan of short-lived files.
# For Linux, uses BCC, eBPF. Embedded C.
#
# This traces the creation and deletion of files, providing information
# on who deleted the file, the file age, and the file name. The intent is to
# provide information on short-lived files, for debugging or performance
# analysis.
#
# USAGE: filelife [-h] [-p PID]
#
# Copyright 2016 Netflix, Inc.
# Licensed under the Apache License, Version 2.0 (the "License")
#
# 08-Feb-2015 Brendan Gregg Created this.
# 17-Feb-2016 Allan McAleavy updated for BPF_PERF_OUTPUT
from __future__ import print_function
from bcc import BPF
import argparse
from time import strftime
import ctypes as ct
# arguments
examples = """examples:
./filelife # trace all stat() syscalls
./filelife -p 181 # only trace PID 181
"""
parser = argparse.ArgumentParser(
description="Trace stat() syscalls",
formatter_class=argparse.RawDescriptionHelpFormatter,
epilog=examples)
parser.add_argument("-p", "--pid",
help="trace this PID only")
parser.add_argument("--ebpf", action="store_true",
help=argparse.SUPPRESS)
args = parser.parse_args()
debug = 0
# define BPF program
bpf_text = """
#include <uapi/linux/ptrace.h>
#include <linux/fs.h>
#include <linux/sched.h>
struct data_t {
u32 pid;
u64 delta;
char comm[TASK_COMM_LEN];
char fname[DNAME_INLINE_LEN];
};
BPF_HASH(birth, struct dentry *);
BPF_PERF_OUTPUT(events);
// trace file creation time
int trace_create(struct pt_regs *ctx, struct inode *dir, struct dentry *dentr
y)
{
u32 pid = bpf_get_current_pid_tgid();
FILTER
u64 ts = bpf_ktime_get_ns();
birth.update(&dentry, &ts);
return 0;
};
// trace file deletion and output details
int trace_unlink(struct pt_regs *ctx, struct inode *dir, struct dentry *dentr
y)
{
struct data_t data = {};
u32 pid = bpf_get_current_pid_tgid();
FILTER
u64 *tsp, delta;
tsp = birth.lookup(&dentry);
if (tsp == 0) {
return 0; // missed create
}
delta = (bpf_ktime_get_ns() - *tsp) / 1000000;
birth.delete(&dentry);
struct qstr d_name = dentry->d_name;
if (d_name.len == 0)
return 0;
if (bpf_get_current_comm(&data.comm, sizeof(data.comm)) == 0) {
data.pid = pid;
data.delta = delta;
bpf_probe_read(&data.fname, sizeof(data.fname), d_name.name);
}
events.perf_submit(ctx, &data, sizeof(data));
return 0;
}
"""
TASK_COMM_LEN = 16 # linux/sched.h
DNAME_INLINE_LEN = 255 # linux/dcache.h
class Data(ct.Structure):
_fields_ = [
("pid", ct.c_uint),
("delta", ct.c_ulonglong),
("comm", ct.c_char * TASK_COMM_LEN),
("fname", ct.c_char * DNAME_INLINE_LEN)
]
if args.pid:
bpf_text = bpf_text.replace('FILTER',
'if (pid != %s) { return 0; }' % args.pid)
else:
bpf_text = bpf_text.replace('FILTER', '')
if debug or args.ebpf:
print(bpf_text)
if args.ebpf:
exit()
# initialize BPF
PrometheusPrometheus
https://prometheus.io/docs/introduction/overview/#architecture
(https://prometheus.io/docs/introduction/overview/#architecture)
Prometheus eBPF ExporterPrometheus eBPF Exporter
leverages on tools made by iovisor/bcc
written in Go
exported metrics are de ned as yaml les
Motivation of this exporter is to allow you to write eBPF code and
export metrics that are not otherwise accessible from the Linux
kernel.
https://github.com/cloud are/ebpf_exporter
(https://github.com/cloud are/ebpf_exporter)
Prometheus eBPF Exporter - ExamplePrometheus eBPF Exporter - Example
https://github.com/cloud are/ebpf_exporter/blob/master/examples/bio.yaml
(https://github.com/cloud are/ebpf_exporter/blob/master/examples/bio.yaml)
programs:
# See:
# * https://github.com/iovisor/bcc/blob/master/tools/biolatency.py
# * https://github.com/iovisor/bcc/blob/master/tools/biolatency_example.txt
#
# See also: bio-tracepoints.yaml
- name: bio
metrics:
histograms:
- name: bio_latency_seconds
help: Block IO latency histogram
table: io_latency
bucket_type: exp2
bucket_min: 0
bucket_max: 26
bucket_multiplier: 0.000001 # microseconds to seconds
labels:
- name: device
size: 32
decoders:
- name: string
- name: operation
size: 8
decoders:
- name: uint
- name: static_map
static_map:
1: read
2: write
- name: bucket
size: 8
decoders:
- name: uint
- name: bio_size_bytes
help: Block IO size histogram with kibibyte buckets
table: io_size
bucket_type: exp2
bucket_min: 0
bucket_max: 15
bucket_multiplier: 1024 # kibibytes to bytes
labels:
- name: device
size: 32
decoders:
- name: string
- name: operation
size: 8
decoders:
- name: uint
- name: static_map
static_map:
1: read
2: write
- name: bucket
size: 8
decoders:
- name: uint
kprobes:
blk_start_request: trace_req_start
blk_mq_start_request: trace_req_start
blk_account_io_completion: trace_req_completion
code: |
#include <linux/blkdev.h>
#include <linux/blk_types.h>
typedef struct disk_key {
char disk[32];
u8 op;
u64 slot;
} disk_key_t;
// Max number of disks we expect to see on the host
const u8 max_disks = 255;
// 27 buckets for latency, max range is 33.6s .. 67.1s
const u8 max_latency_slot = 26;
// 16 buckets per disk in kib, max range is 16mib .. 32mib
const u8 max_size_slot = 15;
// Hash to temporily hold the start time of each bio request, max 10k in
-flight by default
BPF_HASH(start, struct request *);
// Histograms to record latencies
BPF_HISTOGRAM(io_latency, disk_key_t, (max_latency_slot + 1) * max_disk
s);
// Histograms to record sizes
BPF_HISTOGRAM(io_size, disk_key_t, (max_size_slot + 1) * max_disks);
// Record start time of a request
int trace_req_start(struct pt_regs *ctx, struct request *req) {
u64 ts = bpf_ktime_get_ns();
start.update(&req, &ts);
return 0;
}
// Calculate request duration and store in appropriate histogram bucket
int trace_req_completion(struct pt_regs *ctx, struct request *req, unsig
ned int bytes) {
u64 *tsp, delta;
// Fetch timestamp and calculate delta
tsp = start.lookup(&req);
if (tsp == 0) {
return 0; // missed issue
}
// There are write request with zero length on sector zero,
// which do not seem to be real writes to device.
if (req->__sector == 0 && req->__data_len == 0) {
return 0;
}
// Disk that received the request
struct gendisk *disk = req->rq_disk;
// Delta in nanoseconds
delta = bpf_ktime_get_ns() - *tsp;
// Convert to microseconds
delta /= 1000;
// Latency histogram key
u64 latency_slot = bpf_log2l(delta);
// Cap latency bucket at max value
if (latency_slot > max_latency_slot) {
Using eBPF for Monitoring in Cloud-NativeUsing eBPF for Monitoring in Cloud-Native
EnvironmentEnvironment
Cloud Native:
microservice architecture
containerized
orchestrated
Pitfall #1: DependenciesPitfall #1: Dependencies
bcc
LLVM
Linux kernel headers
Pitfall #2: KProbes and Kernel VersionPitfall #2: KProbes and Kernel Version
static int bpf_prog_load(union bpf_attr *attr)
{
enum bpf_prog_type type = attr->prog_type;
struct bpf_prog *prog;
int err;
char license[128];
bool is_gpl;
if (CHECK_ATTR(BPF_PROG_LOAD))
return -EINVAL;
if (attr->prog_flags & ~BPF_F_STRICT_ALIGNMENT)
return -EINVAL;
/* copy eBPF program license from user space */
if (strncpy_from_user(license, u64_to_user_ptr(attr->license),
sizeof(license) - 1) < 0)
return -EFAULT;
license[sizeof(license) - 1] = 0;
/* eBPF programs must be GPL compatible to use GPL-ed functions */
is_gpl = license_is_gpl_compatible(license);
if (attr->insn_cnt == 0 || attr->insn_cnt > BPF_MAXINSNS)
return -E2BIG;
if (type == BPF_PROG_TYPE_KPROBE &&
attr->kern_version != LINUX_VERSION_CODE)
return -EINVAL;
/* ... */
}
Pitfall #3: KProbes and StabilityPitfall #3: KProbes and Stability
Kprobe can be created for any kernel function
Most of the Linux kernel source code is subject to change
in-kernel APIs and ABIs are unstable
Distribution-speci c kernel modi cations, propriatery kernels
Pitfall #4: Kernel SupportPitfall #4: Kernel Support
v4.1: attach BPF programs to kprobes (21 June, 2015)
v4.7: attach BPF programs to tracepoints (24 July, 2016)
RHEL 7.6 (30 October, 2018) has 3.10.0-957
Ongoing ActivitiesOngoing Activities
eBPF-based Prometheus exporter, containerized
run-time con gurable eBPF metrics
self contained
no dep on iovisor/bcc
no dep on Linux kernel headers
supporting the major Linux distributions
Thank you!Thank you!
Questions?Questions?

More Related Content

What's hot

101 3.2 process text streams using filters
101 3.2 process text streams using filters101 3.2 process text streams using filters
101 3.2 process text streams using filters
Acácio Oliveira
 
Building Network Functions with eBPF & BCC
Building Network Functions with eBPF & BCCBuilding Network Functions with eBPF & BCC
Building Network Functions with eBPF & BCC
Kernel TLV
 
UM2019 Extended BPF: A New Type of Software
UM2019 Extended BPF: A New Type of SoftwareUM2019 Extended BPF: A New Type of Software
UM2019 Extended BPF: A New Type of Software
Brendan Gregg
 
Kernel development
Kernel developmentKernel development
Kernel development
Nuno Martins
 
Linux System Monitoring with eBPF
Linux System Monitoring with eBPFLinux System Monitoring with eBPF
Linux System Monitoring with eBPF
Heinrich Hartmann
 
Low Overhead System Tracing with eBPF
Low Overhead System Tracing with eBPFLow Overhead System Tracing with eBPF
Low Overhead System Tracing with eBPF
Akshay Kapoor
 
Linux kernel tracing superpowers in the cloud
Linux kernel tracing superpowers in the cloudLinux kernel tracing superpowers in the cloud
Linux kernel tracing superpowers in the cloud
Andrea Righi
 
BPF Internals (eBPF)
BPF Internals (eBPF)BPF Internals (eBPF)
BPF Internals (eBPF)
Brendan Gregg
 
LPC2019 BPF Tracing Tools
LPC2019 BPF Tracing ToolsLPC2019 BPF Tracing Tools
LPC2019 BPF Tracing Tools
Brendan Gregg
 
netfilter and iptables
netfilter and iptablesnetfilter and iptables
netfilter and iptables
Kernel TLV
 
Kernel Recipes 2017: Performance Analysis with BPF
Kernel Recipes 2017: Performance Analysis with BPFKernel Recipes 2017: Performance Analysis with BPF
Kernel Recipes 2017: Performance Analysis with BPF
Brendan Gregg
 
Performance Analysis Tools for Linux Kernel
Performance Analysis Tools for Linux KernelPerformance Analysis Tools for Linux Kernel
Performance Analysis Tools for Linux Kernel
lcplcp1
 
David container security-with_falco
David container security-with_falcoDavid container security-with_falco
David container security-with_falco
Lorenzo David
 
DockerCon 2017 - Cilium - Network and Application Security with BPF and XDP
DockerCon 2017 - Cilium - Network and Application Security with BPF and XDPDockerCon 2017 - Cilium - Network and Application Security with BPF and XDP
DockerCon 2017 - Cilium - Network and Application Security with BPF and XDP
Thomas Graf
 
Meet cute-between-ebpf-and-tracing
Meet cute-between-ebpf-and-tracingMeet cute-between-ebpf-and-tracing
Meet cute-between-ebpf-and-tracing
Viller Hsiao
 
System Hacking Tutorial #2 - Buffer Overflow - Overwrite EIP
System Hacking Tutorial #2 - Buffer Overflow - Overwrite EIPSystem Hacking Tutorial #2 - Buffer Overflow - Overwrite EIP
System Hacking Tutorial #2 - Buffer Overflow - Overwrite EIP
sanghwan ahn
 
Defcon 2011 network forensics 解题记录
Defcon 2011 network forensics 解题记录Defcon 2011 network forensics 解题记录
Defcon 2011 network forensics 解题记录insight-labs
 
Linux kernel-rootkit-dev - Wonokaerun
Linux kernel-rootkit-dev - WonokaerunLinux kernel-rootkit-dev - Wonokaerun
Linux kernel-rootkit-dev - Wonokaerunidsecconf
 
eBPF Trace from Kernel to Userspace
eBPF Trace from Kernel to UserspaceeBPF Trace from Kernel to Userspace
eBPF Trace from Kernel to Userspace
SUSE Labs Taipei
 
Linux Capabilities - eng - v2.1.5, compact
Linux Capabilities - eng - v2.1.5, compactLinux Capabilities - eng - v2.1.5, compact
Linux Capabilities - eng - v2.1.5, compact
Alessandro Selli
 

What's hot (20)

101 3.2 process text streams using filters
101 3.2 process text streams using filters101 3.2 process text streams using filters
101 3.2 process text streams using filters
 
Building Network Functions with eBPF & BCC
Building Network Functions with eBPF & BCCBuilding Network Functions with eBPF & BCC
Building Network Functions with eBPF & BCC
 
UM2019 Extended BPF: A New Type of Software
UM2019 Extended BPF: A New Type of SoftwareUM2019 Extended BPF: A New Type of Software
UM2019 Extended BPF: A New Type of Software
 
Kernel development
Kernel developmentKernel development
Kernel development
 
Linux System Monitoring with eBPF
Linux System Monitoring with eBPFLinux System Monitoring with eBPF
Linux System Monitoring with eBPF
 
Low Overhead System Tracing with eBPF
Low Overhead System Tracing with eBPFLow Overhead System Tracing with eBPF
Low Overhead System Tracing with eBPF
 
Linux kernel tracing superpowers in the cloud
Linux kernel tracing superpowers in the cloudLinux kernel tracing superpowers in the cloud
Linux kernel tracing superpowers in the cloud
 
BPF Internals (eBPF)
BPF Internals (eBPF)BPF Internals (eBPF)
BPF Internals (eBPF)
 
LPC2019 BPF Tracing Tools
LPC2019 BPF Tracing ToolsLPC2019 BPF Tracing Tools
LPC2019 BPF Tracing Tools
 
netfilter and iptables
netfilter and iptablesnetfilter and iptables
netfilter and iptables
 
Kernel Recipes 2017: Performance Analysis with BPF
Kernel Recipes 2017: Performance Analysis with BPFKernel Recipes 2017: Performance Analysis with BPF
Kernel Recipes 2017: Performance Analysis with BPF
 
Performance Analysis Tools for Linux Kernel
Performance Analysis Tools for Linux KernelPerformance Analysis Tools for Linux Kernel
Performance Analysis Tools for Linux Kernel
 
David container security-with_falco
David container security-with_falcoDavid container security-with_falco
David container security-with_falco
 
DockerCon 2017 - Cilium - Network and Application Security with BPF and XDP
DockerCon 2017 - Cilium - Network and Application Security with BPF and XDPDockerCon 2017 - Cilium - Network and Application Security with BPF and XDP
DockerCon 2017 - Cilium - Network and Application Security with BPF and XDP
 
Meet cute-between-ebpf-and-tracing
Meet cute-between-ebpf-and-tracingMeet cute-between-ebpf-and-tracing
Meet cute-between-ebpf-and-tracing
 
System Hacking Tutorial #2 - Buffer Overflow - Overwrite EIP
System Hacking Tutorial #2 - Buffer Overflow - Overwrite EIPSystem Hacking Tutorial #2 - Buffer Overflow - Overwrite EIP
System Hacking Tutorial #2 - Buffer Overflow - Overwrite EIP
 
Defcon 2011 network forensics 解题记录
Defcon 2011 network forensics 解题记录Defcon 2011 network forensics 解题记录
Defcon 2011 network forensics 解题记录
 
Linux kernel-rootkit-dev - Wonokaerun
Linux kernel-rootkit-dev - WonokaerunLinux kernel-rootkit-dev - Wonokaerun
Linux kernel-rootkit-dev - Wonokaerun
 
eBPF Trace from Kernel to Userspace
eBPF Trace from Kernel to UserspaceeBPF Trace from Kernel to Userspace
eBPF Trace from Kernel to Userspace
 
Linux Capabilities - eng - v2.1.5, compact
Linux Capabilities - eng - v2.1.5, compactLinux Capabilities - eng - v2.1.5, compact
Linux Capabilities - eng - v2.1.5, compact
 

Similar to Efficient System Monitoring in Cloud Native Environments

ebpf and IO Visor: The What, how, and what next!
ebpf and IO Visor: The What, how, and what next!ebpf and IO Visor: The What, how, and what next!
ebpf and IO Visor: The What, how, and what next!
Affan Syed
 
Security Monitoring with eBPF
Security Monitoring with eBPFSecurity Monitoring with eBPF
Security Monitoring with eBPF
Alex Maestretti
 
Performance Wins with eBPF: Getting Started (2021)
Performance Wins with eBPF: Getting Started (2021)Performance Wins with eBPF: Getting Started (2021)
Performance Wins with eBPF: Getting Started (2021)
Brendan Gregg
 
Berkeley Packet Filters
Berkeley Packet FiltersBerkeley Packet Filters
Berkeley Packet Filters
Kernel TLV
 
DCSF 19 eBPF Superpowers
DCSF 19 eBPF SuperpowersDCSF 19 eBPF Superpowers
DCSF 19 eBPF Superpowers
Docker, Inc.
 
eBPF Basics
eBPF BasicseBPF Basics
eBPF Basics
Michael Kehoe
 
Systems@Scale 2021 BPF Performance Getting Started
Systems@Scale 2021 BPF Performance Getting StartedSystems@Scale 2021 BPF Performance Getting Started
Systems@Scale 2021 BPF Performance Getting Started
Brendan Gregg
 
OSSNA 2017 Performance Analysis Superpowers with Linux BPF
OSSNA 2017 Performance Analysis Superpowers with Linux BPFOSSNA 2017 Performance Analysis Superpowers with Linux BPF
OSSNA 2017 Performance Analysis Superpowers with Linux BPF
Brendan Gregg
 
An Overview of the IHK/McKernel Multi-kernel Operating System
An Overview of the IHK/McKernel Multi-kernel Operating SystemAn Overview of the IHK/McKernel Multi-kernel Operating System
An Overview of the IHK/McKernel Multi-kernel Operating System
Linaro
 
Introduction to eBPF and XDP
Introduction to eBPF and XDPIntroduction to eBPF and XDP
Introduction to eBPF and XDP
lcplcp1
 
DISTRIBUTED PERFORMANCE ANALYSIS USING INFLUXDB AND THE LINUX EBPF VIRTUAL MA...
DISTRIBUTED PERFORMANCE ANALYSIS USING INFLUXDB AND THE LINUX EBPF VIRTUAL MA...DISTRIBUTED PERFORMANCE ANALYSIS USING INFLUXDB AND THE LINUX EBPF VIRTUAL MA...
DISTRIBUTED PERFORMANCE ANALYSIS USING INFLUXDB AND THE LINUX EBPF VIRTUAL MA...
InfluxData
 
Kernel bug hunting
Kernel bug huntingKernel bug hunting
Kernel bug hunting
Andrea Righi
 
Modern Linux Tracing Landscape
Modern Linux Tracing LandscapeModern Linux Tracing Landscape
Modern Linux Tracing Landscape
Sasha Goldshtein
 
Linux kernel bug hunting
Linux kernel bug huntingLinux kernel bug hunting
Linux kernel bug hunting
Andrea Righi
 
App container rkt
App container rktApp container rkt
App container rkt
Xiaofeng Guo
 
eBPF - Observability In Deep
eBPF - Observability In DeepeBPF - Observability In Deep
eBPF - Observability In Deep
Mydbops
 
Basic Linux kernel
Basic Linux kernelBasic Linux kernel
Basic Linux kernel
Morteza Nourelahi Alamdari
 
Check the version with fixes. Link in description
Check the version with fixes. Link in descriptionCheck the version with fixes. Link in description
Check the version with fixes. Link in description
Przemyslaw Koltermann
 
Introduction of eBPF - 時下最夯的Linux Technology
Introduction of eBPF - 時下最夯的Linux Technology Introduction of eBPF - 時下最夯的Linux Technology
Introduction of eBPF - 時下最夯的Linux Technology
Jace Liang
 
Debugging Python with gdb
Debugging Python with gdbDebugging Python with gdb
Debugging Python with gdb
Roman Podoliaka
 

Similar to Efficient System Monitoring in Cloud Native Environments (20)

ebpf and IO Visor: The What, how, and what next!
ebpf and IO Visor: The What, how, and what next!ebpf and IO Visor: The What, how, and what next!
ebpf and IO Visor: The What, how, and what next!
 
Security Monitoring with eBPF
Security Monitoring with eBPFSecurity Monitoring with eBPF
Security Monitoring with eBPF
 
Performance Wins with eBPF: Getting Started (2021)
Performance Wins with eBPF: Getting Started (2021)Performance Wins with eBPF: Getting Started (2021)
Performance Wins with eBPF: Getting Started (2021)
 
Berkeley Packet Filters
Berkeley Packet FiltersBerkeley Packet Filters
Berkeley Packet Filters
 
DCSF 19 eBPF Superpowers
DCSF 19 eBPF SuperpowersDCSF 19 eBPF Superpowers
DCSF 19 eBPF Superpowers
 
eBPF Basics
eBPF BasicseBPF Basics
eBPF Basics
 
Systems@Scale 2021 BPF Performance Getting Started
Systems@Scale 2021 BPF Performance Getting StartedSystems@Scale 2021 BPF Performance Getting Started
Systems@Scale 2021 BPF Performance Getting Started
 
OSSNA 2017 Performance Analysis Superpowers with Linux BPF
OSSNA 2017 Performance Analysis Superpowers with Linux BPFOSSNA 2017 Performance Analysis Superpowers with Linux BPF
OSSNA 2017 Performance Analysis Superpowers with Linux BPF
 
An Overview of the IHK/McKernel Multi-kernel Operating System
An Overview of the IHK/McKernel Multi-kernel Operating SystemAn Overview of the IHK/McKernel Multi-kernel Operating System
An Overview of the IHK/McKernel Multi-kernel Operating System
 
Introduction to eBPF and XDP
Introduction to eBPF and XDPIntroduction to eBPF and XDP
Introduction to eBPF and XDP
 
DISTRIBUTED PERFORMANCE ANALYSIS USING INFLUXDB AND THE LINUX EBPF VIRTUAL MA...
DISTRIBUTED PERFORMANCE ANALYSIS USING INFLUXDB AND THE LINUX EBPF VIRTUAL MA...DISTRIBUTED PERFORMANCE ANALYSIS USING INFLUXDB AND THE LINUX EBPF VIRTUAL MA...
DISTRIBUTED PERFORMANCE ANALYSIS USING INFLUXDB AND THE LINUX EBPF VIRTUAL MA...
 
Kernel bug hunting
Kernel bug huntingKernel bug hunting
Kernel bug hunting
 
Modern Linux Tracing Landscape
Modern Linux Tracing LandscapeModern Linux Tracing Landscape
Modern Linux Tracing Landscape
 
Linux kernel bug hunting
Linux kernel bug huntingLinux kernel bug hunting
Linux kernel bug hunting
 
App container rkt
App container rktApp container rkt
App container rkt
 
eBPF - Observability In Deep
eBPF - Observability In DeepeBPF - Observability In Deep
eBPF - Observability In Deep
 
Basic Linux kernel
Basic Linux kernelBasic Linux kernel
Basic Linux kernel
 
Check the version with fixes. Link in description
Check the version with fixes. Link in descriptionCheck the version with fixes. Link in description
Check the version with fixes. Link in description
 
Introduction of eBPF - 時下最夯的Linux Technology
Introduction of eBPF - 時下最夯的Linux Technology Introduction of eBPF - 時下最夯的Linux Technology
Introduction of eBPF - 時下最夯的Linux Technology
 
Debugging Python with gdb
Debugging Python with gdbDebugging Python with gdb
Debugging Python with gdb
 

Recently uploaded

GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
Product School
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
Product School
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
Frank van Harmelen
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
Product School
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Paul Groth
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Ramesh Iyer
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
DianaGray10
 

Recently uploaded (20)

GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 

Efficient System Monitoring in Cloud Native Environments

  • 1. E cient System Monitoring in Cloud NativeE cient System Monitoring in Cloud Native EnvironmentsEnvironments gergely.szabo@origoss.com
  • 2. About MyselfAbout Myself more than 15 years in the industry research, development, system architect, etc... currently at Origoss Solutions Cloud Native Kubernetes Prometheus
  • 5. Packet Filtering ProblemPacket Filtering Problem
  • 6. Filtering RequirementsFiltering Requirements Ef cient Flexible lter rules Safe
  • 7. BPFBPF Steven McCanne and Van Jacobson:Steven McCanne and Van Jacobson: The BSD Packet Filter: A New Architecture for User-levelThe BSD Packet Filter: A New Architecture for User-level Packet Capture, 1992Packet Capture, 1992 http://www.tcpdump.org/papers/bpf-usenix93.pdf (http://www.tcpdump.org/papers/bpf-usenix93.pdf)
  • 9.
  • 10. Capturing without FilteringCapturing without Filtering In [ ]: %%bash sudo tcpdump -nc 4
  • 11. Simple Filtering RuleSimple Filtering Rule In [ ]: %%bash sudo tcpdump -nc 4 tcp and port 80
  • 12. Complex RuleComplex Rule To print all IPv4 HTTP packets to and from port 80, i.e. print only packets that contain data, not, for example, SYN and FIN packets and ACK-only packets. In [ ]: %%bash sudo tcpdump -nc 4 'tcp port 80 and (((ip[2:2] - ((ip[0]&0xf)<<2)) - ((tcp[12]&0 xf0)>>2)) != 0)'
  • 13. How Does This Work?How Does This Work?
  • 14.
  • 15. BPF VM Instruction SetBPF VM Instruction Set
  • 16.
  • 17. Simple Filtering RuleSimple Filtering Rule In [ ]: %%bash tcpdump -d tcp and port 80
  • 18. Complex RuleComplex Rule In [ ]: %%bash tcpdump -d 'tcp port 80 and (((ip[2:2] - ((ip[0]&0xf)<<2)) - ((tcp[12]&0xf0)>> 2)) != 0)'
  • 19. Linux Kernel TracepointsLinux Kernel Tracepoints A tracepoint placed in code provides a hook to call a function (probe) that you can provide at runtime. A tracepoint can be "on" or "off" When a tracepoint is "on", the function you provide is called each time the tracepoint is executed They can be used for tracing and performance accounting.
  • 20. Adding TracepointsAdding Tracepoints void blk_requeue_request(struct request_queue *q, struct request *rq) { blk_delete_timer(rq); blk_clear_rq_complete(rq); trace_block_rq_requeue(q, rq); // <- Tracepoint hook if (rq->cmd_flags & REQ_QUEUED) blk_queue_end_tag(q, rq); BUG_ON(blk_queued_rq(rq)); elv_requeue_request(q, rq); }
  • 21. List of TracepointsList of Tracepoints In [ ]: %%bash perf list tracepoint
  • 22. Tracepoints in ActionTracepoints in Action In [ ]: %%bash sudo perf stat -a -e kmem:kmalloc sleep 10
  • 23. Linux Kernel KProbesLinux Kernel KProbes dynamically break into any kernel routine and collect debugging and performance information non-disruptively. some parts of the kernel code can not be trapped two types of probes: kprobes, and kretprobes A kprobe can be inserted on virtually any instruction in the kernel. A return probe res when a speci ed function returns.
  • 24. List of KProbesList of KProbes In [ ]: %%bash sudo cat /sys/kernel/debug/kprobes/list
  • 25. Probing a Linux FunctionProbing a Linux Function In [ ]: void blk_delete_timer(struct request *req) { list_del_init(&req->timeout_list); } %%bash sudo sh -c 'echo p:demo_probe blk_delete_timer >> /sys/kernel/debug/tracing/kpro be_events'
  • 26. List of KProbesList of KProbes In [ ]: In [ ]: %%bash sudo cat /sys/kernel/debug/kprobes/list %%bash sudo perf list | grep demo
  • 27. KProbes in ActionKProbes in Action In [ ]: %%bash sudo perf stat -a -e kprobes:demo_probe sleep 10
  • 28. Removing KProbeRemoving KProbe In [ ]: In [ ]: In [ ]: %%bash sudo sh -c 'echo "-:demo_probe" >> /sys/kernel/debug/tracing/kprobe_events' %%bash sudo cat /sys/kernel/debug/kprobes/list %%bash sudo perf list | grep demo
  • 30. Recent Developments: eBPFRecent Developments: eBPF v3.15: BPF machine upgrade (64bit registers, more registers, new instruction) v3.16: JIT compiling v3.18: BPF maps v4.1: attach BPF programs to kprobes v4.7: attach BPF programs to tracepoints v4.8: ... XDP (https://www.iovisor.org/technology/xdp)
  • 31. eBPF MapseBPF Maps 15+ map types: BPF_MAP_TYPE_HASH, BPF_MAP_TYPE_ARRAY, BPF_MAP_TYPE_PROG_ARRAY, BPF_MAP_TYPE_PERF_EVENT_ARRAY, ... associated to a userspace process read/written by userspace process, eBPF programs
  • 32. eBPF Map OperationseBPF Map Operations int bpf_create_map(enum bpf_map_type map_type, unsigned int key_size, unsigned int value_size, unsigned int max_entries) int bpf_lookup_elem(int fd, const void *key, void *value) int bpf_update_elem(int fd, const void *key, const void *value, uint64_t flags ) int bpf_delete_elem(int fd, const void *key) int bpf_get_next_key(int fd, const void *key, void *next_key)
  • 33. eBPF ProgramseBPF Programs 20+ program types: BPF_PROG_TYPE_SOCKET_FILTER, BPF_PROG_TYPE_KPROBE, BPF_PROG_TYPE_TRACEPOINT, BPF_PROG_TYPE_XDP, ... associated to a userspace process event-based execution (e.g. tracepoint hooks) executed by BPF VM safe ef cient
  • 34. eBPF Program OperationseBPF Program Operations int bpf_prog_load(enum bpf_prog_type type, const struct bpf_insn *insns, int i nsn_cnt, const char *license)
  • 35. eBPF Program as C structeBPF Program as C struct struct bpf_insn prog[] = { BPF_MOV64_REG(BPF_REG_6, BPF_REG_1), /* r6 = r1 */ BPF_LD_ABS(BPF_B, ETH_HLEN + offsetof(struct iphdr, protocol)), /* r0 = ip->proto */ BPF_STX_MEM(BPF_W, BPF_REG_10, BPF_REG_0, -4), /* *(u32 *)(fp - 4) = r0 */ BPF_MOV64_REG(BPF_REG_2, BPF_REG_10), /* r2 = fp */ BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -4), /* r2 = r2 - 4 */ BPF_LD_MAP_FD(BPF_REG_1, map_fd), /* r1 = map_fd */ BPF_CALL_FUNC(BPF_FUNC_map_lookup_elem), /* r0 = map_lookup(r1, r2) */ BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 2), /* if (r0 == 0) goto pc+2 */ BPF_MOV64_IMM(BPF_REG_1, 1), /* r1 = 1 */ BPF_XADD(BPF_DW, BPF_REG_0, BPF_REG_1, 0, 0), /* lock *(u64 *) r0 += r1 */ BPF_MOV64_IMM(BPF_REG_0, 0), /* r0 = 0 */ BPF_EXIT_INSN(), /* return r0 */ };
  • 36. eBPF Program as C CodeeBPF Program as C Code Can be compiled with LLVM/Clang using the BPF backend. int bpf_prog1(struct pt_regs *ctx) { /* attaches to kprobe netif_receive_skb, * looks for packets on loobpack device and prints them */ char devname[IFNAMSIZ]; struct net_device *dev; struct sk_buff *skb; int len; /* non-portable! works for the given kernel only */ skb = (struct sk_buff *) PT_REGS_PARM1(ctx); dev = _(skb->dev); len = _(skb->len); bpf_probe_read(devname, sizeof(devname), dev->name); if (devname[0] == 'l' && devname[1] == 'o') { char fmt[] = "skb %p len %dn"; /* using bpf_trace_printk() for DEBUG ONLY */ bpf_trace_printk(fmt, sizeof(fmt), skb, len); } return 0; }
  • 38. eBPF Work ow: Linux Kernel BPF SampleseBPF Work ow: Linux Kernel BPF Samples see linux/samples/bpf eBPF kernel part (.c) contains map and program de nitions compiled with LLVM -> .o eBPF user part (.c) compiles to executable extracts maps and programs from kernel part (.o) creates maps: bpf_create_map relocates maps in program codes loads programs: bpf_prog_load reads maps and generates output
  • 39. eBPF Work ow:eBPF Work ow: iovisor/bcc see single Python script that contains: de nition of eBPF maps de nition of eBPF programs (as LLVM compatible C code) code to read and process the maps C code is compiled when the script starts (LLVM) https://github.com/iovisor/bcc (https://github.com/iovisor/bcc)
  • 40. eBPF ExampleeBPF Example https://github.com/iovisor/bcc/blob/master/tools/ lelife.py (https://github.com/iovisor/bcc/blob/master/tools/ lelife.py)
  • 41. #!/usr/bin/python # @lint-avoid-python-3-compatibility-imports # # filelife Trace the lifespan of short-lived files. # For Linux, uses BCC, eBPF. Embedded C. # # This traces the creation and deletion of files, providing information # on who deleted the file, the file age, and the file name. The intent is to # provide information on short-lived files, for debugging or performance # analysis. # # USAGE: filelife [-h] [-p PID] # # Copyright 2016 Netflix, Inc. # Licensed under the Apache License, Version 2.0 (the "License") # # 08-Feb-2015 Brendan Gregg Created this. # 17-Feb-2016 Allan McAleavy updated for BPF_PERF_OUTPUT from __future__ import print_function from bcc import BPF import argparse from time import strftime import ctypes as ct # arguments examples = """examples: ./filelife # trace all stat() syscalls ./filelife -p 181 # only trace PID 181 """ parser = argparse.ArgumentParser( description="Trace stat() syscalls", formatter_class=argparse.RawDescriptionHelpFormatter, epilog=examples) parser.add_argument("-p", "--pid", help="trace this PID only") parser.add_argument("--ebpf", action="store_true",
  • 42. help=argparse.SUPPRESS) args = parser.parse_args() debug = 0 # define BPF program bpf_text = """ #include <uapi/linux/ptrace.h> #include <linux/fs.h> #include <linux/sched.h> struct data_t { u32 pid; u64 delta; char comm[TASK_COMM_LEN]; char fname[DNAME_INLINE_LEN]; }; BPF_HASH(birth, struct dentry *); BPF_PERF_OUTPUT(events); // trace file creation time int trace_create(struct pt_regs *ctx, struct inode *dir, struct dentry *dentr y) { u32 pid = bpf_get_current_pid_tgid(); FILTER u64 ts = bpf_ktime_get_ns(); birth.update(&dentry, &ts); return 0; }; // trace file deletion and output details int trace_unlink(struct pt_regs *ctx, struct inode *dir, struct dentry *dentr y) { struct data_t data = {}; u32 pid = bpf_get_current_pid_tgid(); FILTER u64 *tsp, delta; tsp = birth.lookup(&dentry); if (tsp == 0) { return 0; // missed create
  • 43. } delta = (bpf_ktime_get_ns() - *tsp) / 1000000; birth.delete(&dentry); struct qstr d_name = dentry->d_name; if (d_name.len == 0) return 0; if (bpf_get_current_comm(&data.comm, sizeof(data.comm)) == 0) { data.pid = pid; data.delta = delta; bpf_probe_read(&data.fname, sizeof(data.fname), d_name.name); } events.perf_submit(ctx, &data, sizeof(data)); return 0; } """ TASK_COMM_LEN = 16 # linux/sched.h DNAME_INLINE_LEN = 255 # linux/dcache.h class Data(ct.Structure): _fields_ = [ ("pid", ct.c_uint), ("delta", ct.c_ulonglong), ("comm", ct.c_char * TASK_COMM_LEN), ("fname", ct.c_char * DNAME_INLINE_LEN) ] if args.pid: bpf_text = bpf_text.replace('FILTER', 'if (pid != %s) { return 0; }' % args.pid) else: bpf_text = bpf_text.replace('FILTER', '') if debug or args.ebpf: print(bpf_text) if args.ebpf: exit() # initialize BPF
  • 45. Prometheus eBPF ExporterPrometheus eBPF Exporter leverages on tools made by iovisor/bcc written in Go exported metrics are de ned as yaml les Motivation of this exporter is to allow you to write eBPF code and export metrics that are not otherwise accessible from the Linux kernel. https://github.com/cloud are/ebpf_exporter (https://github.com/cloud are/ebpf_exporter)
  • 46. Prometheus eBPF Exporter - ExamplePrometheus eBPF Exporter - Example https://github.com/cloud are/ebpf_exporter/blob/master/examples/bio.yaml (https://github.com/cloud are/ebpf_exporter/blob/master/examples/bio.yaml)
  • 47. programs: # See: # * https://github.com/iovisor/bcc/blob/master/tools/biolatency.py # * https://github.com/iovisor/bcc/blob/master/tools/biolatency_example.txt # # See also: bio-tracepoints.yaml - name: bio metrics: histograms: - name: bio_latency_seconds help: Block IO latency histogram table: io_latency bucket_type: exp2 bucket_min: 0 bucket_max: 26 bucket_multiplier: 0.000001 # microseconds to seconds labels: - name: device size: 32 decoders: - name: string - name: operation size: 8 decoders: - name: uint - name: static_map static_map: 1: read 2: write - name: bucket size: 8 decoders: - name: uint - name: bio_size_bytes help: Block IO size histogram with kibibyte buckets table: io_size bucket_type: exp2
  • 48. bucket_min: 0 bucket_max: 15 bucket_multiplier: 1024 # kibibytes to bytes labels: - name: device size: 32 decoders: - name: string - name: operation size: 8 decoders: - name: uint - name: static_map static_map: 1: read 2: write - name: bucket size: 8 decoders: - name: uint kprobes: blk_start_request: trace_req_start blk_mq_start_request: trace_req_start blk_account_io_completion: trace_req_completion code: | #include <linux/blkdev.h> #include <linux/blk_types.h> typedef struct disk_key { char disk[32]; u8 op; u64 slot; } disk_key_t; // Max number of disks we expect to see on the host const u8 max_disks = 255; // 27 buckets for latency, max range is 33.6s .. 67.1s const u8 max_latency_slot = 26; // 16 buckets per disk in kib, max range is 16mib .. 32mib const u8 max_size_slot = 15;
  • 49. // Hash to temporily hold the start time of each bio request, max 10k in -flight by default BPF_HASH(start, struct request *); // Histograms to record latencies BPF_HISTOGRAM(io_latency, disk_key_t, (max_latency_slot + 1) * max_disk s); // Histograms to record sizes BPF_HISTOGRAM(io_size, disk_key_t, (max_size_slot + 1) * max_disks); // Record start time of a request int trace_req_start(struct pt_regs *ctx, struct request *req) { u64 ts = bpf_ktime_get_ns(); start.update(&req, &ts); return 0; } // Calculate request duration and store in appropriate histogram bucket int trace_req_completion(struct pt_regs *ctx, struct request *req, unsig ned int bytes) { u64 *tsp, delta; // Fetch timestamp and calculate delta tsp = start.lookup(&req); if (tsp == 0) { return 0; // missed issue } // There are write request with zero length on sector zero, // which do not seem to be real writes to device. if (req->__sector == 0 && req->__data_len == 0) { return 0; } // Disk that received the request struct gendisk *disk = req->rq_disk; // Delta in nanoseconds delta = bpf_ktime_get_ns() - *tsp; // Convert to microseconds delta /= 1000; // Latency histogram key u64 latency_slot = bpf_log2l(delta); // Cap latency bucket at max value if (latency_slot > max_latency_slot) {
  • 50. Using eBPF for Monitoring in Cloud-NativeUsing eBPF for Monitoring in Cloud-Native EnvironmentEnvironment Cloud Native: microservice architecture containerized orchestrated
  • 51. Pitfall #1: DependenciesPitfall #1: Dependencies bcc LLVM Linux kernel headers
  • 52. Pitfall #2: KProbes and Kernel VersionPitfall #2: KProbes and Kernel Version static int bpf_prog_load(union bpf_attr *attr) { enum bpf_prog_type type = attr->prog_type; struct bpf_prog *prog; int err; char license[128]; bool is_gpl; if (CHECK_ATTR(BPF_PROG_LOAD)) return -EINVAL; if (attr->prog_flags & ~BPF_F_STRICT_ALIGNMENT) return -EINVAL; /* copy eBPF program license from user space */ if (strncpy_from_user(license, u64_to_user_ptr(attr->license), sizeof(license) - 1) < 0) return -EFAULT; license[sizeof(license) - 1] = 0; /* eBPF programs must be GPL compatible to use GPL-ed functions */ is_gpl = license_is_gpl_compatible(license); if (attr->insn_cnt == 0 || attr->insn_cnt > BPF_MAXINSNS) return -E2BIG; if (type == BPF_PROG_TYPE_KPROBE && attr->kern_version != LINUX_VERSION_CODE) return -EINVAL; /* ... */ }
  • 53. Pitfall #3: KProbes and StabilityPitfall #3: KProbes and Stability Kprobe can be created for any kernel function Most of the Linux kernel source code is subject to change in-kernel APIs and ABIs are unstable Distribution-speci c kernel modi cations, propriatery kernels
  • 54. Pitfall #4: Kernel SupportPitfall #4: Kernel Support v4.1: attach BPF programs to kprobes (21 June, 2015) v4.7: attach BPF programs to tracepoints (24 July, 2016) RHEL 7.6 (30 October, 2018) has 3.10.0-957
  • 55. Ongoing ActivitiesOngoing Activities eBPF-based Prometheus exporter, containerized run-time con gurable eBPF metrics self contained no dep on iovisor/bcc no dep on Linux kernel headers supporting the major Linux distributions