Grzegorz Nosek,
Garbage In, Garbage Out 
?
Garbage In, Garbage Out 
syscalls syscalls 
*may contain traces of signals and shared memory
Syscalls 
ssize_t read(int fd, void *buf, size_t count); 
ssize_t write(int fd, const void *buf, size_t count); 
int open(const char *pathname, int flags, mode_t mode); 
int close(int fd); 
int stat(const char *path, struct stat *buf); 
... 
! 
$ grep -c __NR_ /usr/include/asm/unistd_64.h 
313 
! 
$ man 2 read
strace 
# strace cat /etc/hostname 
execve("/bin/cat", ["cat", "/etc/hostname"], ... 
brk(0) = 0x1675000 
access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT 
mmap(NULL, 8192, PROT_READ|PROT_WRITE, ... 
access("/etc/ld.so.preload", R_OK) = -1 ENOENT 
open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3 
fstat(3, {st_mode=S_IFREG|0644, st_size=48577, ...}) = 0 
mmap(NULL, 48577, PROT_READ, MAP_PRIVATE, 3, 0) = ... 
close(3) = 0 
access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT 
(...)
that’s cool, but… 
1 m illion sysca lls, as fast as possible 
worst case for a ny tracer 
# dd if=/dev/zero of=/dev/null bs=1k count=1M 
1048576+0 records in 
1048576+0 records out 
1073741824 bytes (1.1 GB) copied, 0.332905 s, 3.2 GB/s 
# strace -o /dev/null !! 
1048576+0 records in 
1048576+0 records out 
1073741824 bytes (1.1 GB) copied, 18.2365 s, 58.9 MB/s 
50x overhead
@brendangregg’s 
http://www.slideshare.net/brendangregg/linux-performance-tools-2014
@brendangregg’s 
http://www.slideshare.net/brendangregg/linux-performance-tools-2014
hello, sysdig 
# sysdig | head -5 
3 15:26:36.552482922 0 sysdig (7311) > switch 
next=329(systemd-udevd) pgft_maj=6 pgft_min=1432 
vm_size=26740 vm_rss=3052 vm_swap=0 
4 15:26:36.552502349 0 systemd-udevd (329) < read 
res=2352 data=# This file is part of systemd..#.# 
systemd is free software; you can redistri 
5 15:26:36.552590722 0 systemd-udevd (329) > read 
fd=12(<f>/lib/udev/rules.d/42-usb-hid-pm.rules) 
size=4096 
6 15:26:36.552593880 0 systemd-udevd (329) < read 
res=0 data= 
7 15:26:36.552596220 0 systemd-udevd (329) > close 
fd=12(<f>/lib/udev/rules.d/42-usb-hid-pm.rules)
hello, sysdig 
# sysdig | head -5 
3 15:26:36.552482922 0 sysdig (7311) > switch 
next=329(systemd-udevd) pgft_maj=6 pgft_min=1432 
vm_size=26740 vm_rss=3052 vm_swap=0 
4 15:26:36.552502349 0 systemd-udevd (329) < read 
res=2352 data=# This file is part of systemd..#.# 
systemd is free software; you can redistri 
5 15:26:36.552590722 0 systemd-udevd (329) > read 
fd=12(<f>/lib/udev/rules.d/42-usb-hid-pm.rules) 
size=4096 
6 15:26:36.552593880 0 systemd-udevd (329) < read 
res=0 data= 
7 15:26:36.552596220 0 systemd-udevd (329) > close 
fd=12(<f>/lib/udev/rules.d/42-usb-hid-pm.rules) 
event number, timestamp
hello, sysdig 
# sysdig | head -5 
3 15:26:36.552482922 0 sysdig (7311) > switch 
next=329(systemd-udevd) pgft_maj=6 pgft_min=1432 
vm_size=26740 vm_rss=3052 vm_swap=0 
4 15:26:36.552502349 0 systemd-udevd (329) < read 
res=2352 data=# This file is part of systemd..#.# 
systemd is free software; you can redistri 
5 15:26:36.552590722 0 systemd-udevd (329) > read 
fd=12(<f>/lib/udev/rules.d/42-usb-hid-pm.rules) 
size=4096 
6 15:26:36.552593880 0 systemd-udevd (329) < read 
res=0 data= 
7 15:26:36.552596220 0 systemd-udevd (329) > close 
fd=12(<f>/lib/udev/rules.d/42-usb-hid-pm.rules) 
CPU number, process name, pid
hello, sysdig 
# sysdig | head -5 
3 15:26:36.552482922 0 sysdig (7311) > switch 
next=329(systemd-udevd) pgft_maj=6 pgft_min=1432 
vm_size=26740 vm_rss=3052 vm_swap=0 
4 15:26:36.552502349 0 systemd-udevd (329) < read 
res=2352 data=# This file is part of systemd..#.# 
systemd is free software; you can redistri 
5 15:26:36.552590722 0 systemd-udevd (329) > read 
fd=12(<f>/lib/udev/rules.d/42-usb-hid-pm.rules) 
size=4096 
6 15:26:36.552593880 0 systemd-udevd (329) < read 
res=0 data= 
7 15:26:36.552596220 0 systemd-udevd (329) > close 
fd=12(<f>/lib/udev/rules.d/42-usb-hid-pm.rules) 
event direction, event type
hello, sysdig 
enter event 
# sysdig | head -5 
3 15:26:36.552482922 0 sysdig (7311) > switch 
next=329(systemd-udevd) pgft_maj=6 pgft_min=1432 
vm_size=26740 vm_rss=3052 vm_swap=0 
4 15:26:36.552502349 0 systemd-udevd (329) < read 
res=2352 data=# This file is part of systemd..#.# 
systemd is free software; you can redistri 
5 15:26:36.552590722 0 systemd-udevd (329) > read 
fd=12(<f>/lib/udev/rules.d/42-usb-hid-pm.rules) 
size=4096 
6 15:26:36.552593880 0 systemd-udevd (329) < read 
res=0 data= 
7 15:26:36.552596220 0 systemd-udevd (329) > close 
fd=12(<f>/lib/udev/rules.d/42-usb-hid-pm.rules) 
exit event 
syscall
hello, sysdig 
# sysdig | head -5 
3 15:26:36.552482922 0 sysdig (7311) > switch 
next=329(systemd-udevd) pgft_maj=6 pgft_min=1432 
vm_size=26740 vm_rss=3052 vm_swap=0 
4 15:26:36.552502349 0 systemd-udevd (329) < read 
res=2352 data=# This file is part of systemd..#.# 
systemd is free software; you can redistri 
5 15:26:36.552590722 0 systemd-udevd (329) > read 
fd=12(<f>/lib/udev/rules.d/42-usb-hid-pm.rules) 
size=4096 
6 15:26:36.552593880 0 systemd-udevd (329) < read 
res=0 data= 
7 15:26:36.552596220 0 systemd-udevd (329) > close 
fd=12(<f>/lib/udev/rules.d/42-usb-hid-pm.rules) 
arbitrary event attributes
events 
# sysdig -L 
(...) 
> open() 
< open(FD fd, FSPATH name, FLAGS32 flags, UINT32 
mode) 
> close(FD fd) 
< close(ERRNO res) 
> read(FD fd, UINT32 size) 
< read(ERRNO res, BYTEBUF data) 
> write(FD fd, UINT32 size) 
< write(ERRNO res, BYTEBUF data) 
(...)
dd 
kernel 
1073741824 bytes (1.1 GB) copied, 0.332905 s, 3.2 GB/s
dd 
kernel 
strace 
1073741824 bytes (1.1 GB) copied, 18.2365 s, 58.9 MB/s
dd 
kernel 
sysdig 
ring buffer 
1073741824 bytes (1.1 GB) copied, 1.30029 s, 826 MB/s
filters 
fd.name FD full name. If the fd is a file, this 
field contains the full path. If the FD 
is a socket, this field contain the 
connection tuple. 
! 
proc.apid the pid of one of the process 
ancestors. 
! 
evt.latency delta between an exit event and the 
correspondent enter event. 
! 
(...) 
! 
# sysdig -l | grep -Ec '^[a-z0-9_.]+' 
88
filters 
# sysdig fd.name contains shadow 
2303 17:30:34.645573185 0 cat (24012) < open 
fd=-13(EACCES) name=/etc/shadow flags=1(O_RDONLY) 
mode=0 
! 
# sysdig evt.res = EACCES or evt.res = EPERM 
617 17:32:16.197820784 0 cat (24027) < open 
fd=-13(EACCES) name=/etc/shadow flags=1(O_RDONLY) 
mode=0 
4333 17:32:26.239052264 0 killall (24028) < kill 
res=-1(EPERM)
back to that dd again… 
# sysdig proc.name=not_dd > /dev/null & dd if=/dev/ 
zero of=/dev/null bs=1k count=1M ; killall sysdig 
[1] 24070 
1048576+0 records in 
1048576+0 records out 
1073741824 bytes (1.1 GB) copied, 0.981408 s, 1.1 GB/s
output formatting 
sa m e as filters (mostly) 
# sysdig -p '%user.name %proc.name %fd.name: %evt.res' 
evt.failed = true 
ubuntu cat /etc/shadow: EACCES 
ubuntu cat /usr/share/locale/en_US.UTF-8/LC_MESSAGES/ 
libc.mo: ENOENT 
ubuntu cat /usr/share/locale/en_US.utf8/LC_MESSAGES/ 
libc.mo: ENOENT 
ubuntu cat /usr/share/locale/en_US/LC_MESSAGES/ 
libc.mo: ENOENT
bottleneck in a haystack 
# sysdig -p '%evt.latency.s.%evt.latency.ns %evt.dir 
%evt.type %fd.name' fd.type contains ip and fd.sport != 22 
(...) 
0.000000000 >sendto 192.168.1.118:36220->46.28.247.84:80 
0.000114365 <sendto 192.168.1.118:36220->46.28.247.84:80 
0.000000000 >recvfrom 192.168.1.118:36220->46.28.247.84:80 
0.000005090 <recvfrom 192.168.1.118:36220->46.28.247.84:80 
0.000000000 >close 192.168.1.118:36220->46.28.247.84:80 
0.000001587 <close 192.168.1.118:36220->46.28.247.84:80
sysdig -w 
sysdig -r 
sysdig -r 
sysdig -r 
.scap file 
shit’s on fire, yo 
capture trace file, 
restore service analyze trace at your leisure
lies, damn lies and benchmarks 
sysdig -w 
sysdig -r 
sysdig -r 
sysdig -r 
.scap file 
do a single 
benchmark run 
analyze/postprocess 
lots of ways
chisels: higher level of awesome 
Lua 
sysdig -cl 
sysdig -i chisel_name 
sysdig -c chisel_name [args…]
chisel all the things! 
# sysdig -cl | grep -c ^[a-z] 
37 
# find /usr/share/sysdig/chisels/ -name '*.lua' | wc -l 
42 
the extra ones a re utilities to use in ch isels 
(json, A NSI term ina l, etc.)
chisels: performance 
bottlenecks Slowest system calls 
fileslower Trace slow file I/O 
netlower Trace slow network I/O 
proc_exec_time Show process execution time 
scallslower Trace slow syscalls 
topscalls Top system calls by number of calls 
topscalls_time Top system calls by time 
yu p, a ty po ;)
chisels: security 
list_login_shells List the login shell IDs 
! 
shellshock_detect print shellshock attacks 
! 
spy_users Display interactive user activity 
power corru pts, 
absolute power is even more fun
All right gentlemen, 
we need some system info 
lsof, ps, n etstat 
lsof, ps, netstat 
with time travel 
http://draios.com/ps-lsof-netstat-time-travel/
gotcha!
version 0.1.91 
do you feel lucky? 
• some syscalls not yet implemented (no args) 
• it did crash once (fixed immediately though) 
• PID namespaces ignored 
• root/privileged user only 
• one sysdig process at a time 
way better tha n strace though
Sysdig

Sysdig

  • 1.
  • 2.
  • 3.
    Garbage In, GarbageOut syscalls syscalls *may contain traces of signals and shared memory
  • 4.
    Syscalls ssize_t read(intfd, void *buf, size_t count); ssize_t write(int fd, const void *buf, size_t count); int open(const char *pathname, int flags, mode_t mode); int close(int fd); int stat(const char *path, struct stat *buf); ... ! $ grep -c __NR_ /usr/include/asm/unistd_64.h 313 ! $ man 2 read
  • 5.
    strace # stracecat /etc/hostname execve("/bin/cat", ["cat", "/etc/hostname"], ... brk(0) = 0x1675000 access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT mmap(NULL, 8192, PROT_READ|PROT_WRITE, ... access("/etc/ld.so.preload", R_OK) = -1 ENOENT open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3 fstat(3, {st_mode=S_IFREG|0644, st_size=48577, ...}) = 0 mmap(NULL, 48577, PROT_READ, MAP_PRIVATE, 3, 0) = ... close(3) = 0 access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (...)
  • 6.
    that’s cool, but… 1 m illion sysca lls, as fast as possible worst case for a ny tracer # dd if=/dev/zero of=/dev/null bs=1k count=1M 1048576+0 records in 1048576+0 records out 1073741824 bytes (1.1 GB) copied, 0.332905 s, 3.2 GB/s # strace -o /dev/null !! 1048576+0 records in 1048576+0 records out 1073741824 bytes (1.1 GB) copied, 18.2365 s, 58.9 MB/s 50x overhead
  • 8.
  • 9.
  • 10.
    hello, sysdig #sysdig | head -5 3 15:26:36.552482922 0 sysdig (7311) > switch next=329(systemd-udevd) pgft_maj=6 pgft_min=1432 vm_size=26740 vm_rss=3052 vm_swap=0 4 15:26:36.552502349 0 systemd-udevd (329) < read res=2352 data=# This file is part of systemd..#.# systemd is free software; you can redistri 5 15:26:36.552590722 0 systemd-udevd (329) > read fd=12(<f>/lib/udev/rules.d/42-usb-hid-pm.rules) size=4096 6 15:26:36.552593880 0 systemd-udevd (329) < read res=0 data= 7 15:26:36.552596220 0 systemd-udevd (329) > close fd=12(<f>/lib/udev/rules.d/42-usb-hid-pm.rules)
  • 11.
    hello, sysdig #sysdig | head -5 3 15:26:36.552482922 0 sysdig (7311) > switch next=329(systemd-udevd) pgft_maj=6 pgft_min=1432 vm_size=26740 vm_rss=3052 vm_swap=0 4 15:26:36.552502349 0 systemd-udevd (329) < read res=2352 data=# This file is part of systemd..#.# systemd is free software; you can redistri 5 15:26:36.552590722 0 systemd-udevd (329) > read fd=12(<f>/lib/udev/rules.d/42-usb-hid-pm.rules) size=4096 6 15:26:36.552593880 0 systemd-udevd (329) < read res=0 data= 7 15:26:36.552596220 0 systemd-udevd (329) > close fd=12(<f>/lib/udev/rules.d/42-usb-hid-pm.rules) event number, timestamp
  • 12.
    hello, sysdig #sysdig | head -5 3 15:26:36.552482922 0 sysdig (7311) > switch next=329(systemd-udevd) pgft_maj=6 pgft_min=1432 vm_size=26740 vm_rss=3052 vm_swap=0 4 15:26:36.552502349 0 systemd-udevd (329) < read res=2352 data=# This file is part of systemd..#.# systemd is free software; you can redistri 5 15:26:36.552590722 0 systemd-udevd (329) > read fd=12(<f>/lib/udev/rules.d/42-usb-hid-pm.rules) size=4096 6 15:26:36.552593880 0 systemd-udevd (329) < read res=0 data= 7 15:26:36.552596220 0 systemd-udevd (329) > close fd=12(<f>/lib/udev/rules.d/42-usb-hid-pm.rules) CPU number, process name, pid
  • 13.
    hello, sysdig #sysdig | head -5 3 15:26:36.552482922 0 sysdig (7311) > switch next=329(systemd-udevd) pgft_maj=6 pgft_min=1432 vm_size=26740 vm_rss=3052 vm_swap=0 4 15:26:36.552502349 0 systemd-udevd (329) < read res=2352 data=# This file is part of systemd..#.# systemd is free software; you can redistri 5 15:26:36.552590722 0 systemd-udevd (329) > read fd=12(<f>/lib/udev/rules.d/42-usb-hid-pm.rules) size=4096 6 15:26:36.552593880 0 systemd-udevd (329) < read res=0 data= 7 15:26:36.552596220 0 systemd-udevd (329) > close fd=12(<f>/lib/udev/rules.d/42-usb-hid-pm.rules) event direction, event type
  • 14.
    hello, sysdig enterevent # sysdig | head -5 3 15:26:36.552482922 0 sysdig (7311) > switch next=329(systemd-udevd) pgft_maj=6 pgft_min=1432 vm_size=26740 vm_rss=3052 vm_swap=0 4 15:26:36.552502349 0 systemd-udevd (329) < read res=2352 data=# This file is part of systemd..#.# systemd is free software; you can redistri 5 15:26:36.552590722 0 systemd-udevd (329) > read fd=12(<f>/lib/udev/rules.d/42-usb-hid-pm.rules) size=4096 6 15:26:36.552593880 0 systemd-udevd (329) < read res=0 data= 7 15:26:36.552596220 0 systemd-udevd (329) > close fd=12(<f>/lib/udev/rules.d/42-usb-hid-pm.rules) exit event syscall
  • 15.
    hello, sysdig #sysdig | head -5 3 15:26:36.552482922 0 sysdig (7311) > switch next=329(systemd-udevd) pgft_maj=6 pgft_min=1432 vm_size=26740 vm_rss=3052 vm_swap=0 4 15:26:36.552502349 0 systemd-udevd (329) < read res=2352 data=# This file is part of systemd..#.# systemd is free software; you can redistri 5 15:26:36.552590722 0 systemd-udevd (329) > read fd=12(<f>/lib/udev/rules.d/42-usb-hid-pm.rules) size=4096 6 15:26:36.552593880 0 systemd-udevd (329) < read res=0 data= 7 15:26:36.552596220 0 systemd-udevd (329) > close fd=12(<f>/lib/udev/rules.d/42-usb-hid-pm.rules) arbitrary event attributes
  • 16.
    events # sysdig-L (...) > open() < open(FD fd, FSPATH name, FLAGS32 flags, UINT32 mode) > close(FD fd) < close(ERRNO res) > read(FD fd, UINT32 size) < read(ERRNO res, BYTEBUF data) > write(FD fd, UINT32 size) < write(ERRNO res, BYTEBUF data) (...)
  • 17.
    dd kernel 1073741824bytes (1.1 GB) copied, 0.332905 s, 3.2 GB/s
  • 18.
    dd kernel strace 1073741824 bytes (1.1 GB) copied, 18.2365 s, 58.9 MB/s
  • 19.
    dd kernel sysdig ring buffer 1073741824 bytes (1.1 GB) copied, 1.30029 s, 826 MB/s
  • 21.
    filters fd.name FDfull name. If the fd is a file, this field contains the full path. If the FD is a socket, this field contain the connection tuple. ! proc.apid the pid of one of the process ancestors. ! evt.latency delta between an exit event and the correspondent enter event. ! (...) ! # sysdig -l | grep -Ec '^[a-z0-9_.]+' 88
  • 22.
    filters # sysdigfd.name contains shadow 2303 17:30:34.645573185 0 cat (24012) < open fd=-13(EACCES) name=/etc/shadow flags=1(O_RDONLY) mode=0 ! # sysdig evt.res = EACCES or evt.res = EPERM 617 17:32:16.197820784 0 cat (24027) < open fd=-13(EACCES) name=/etc/shadow flags=1(O_RDONLY) mode=0 4333 17:32:26.239052264 0 killall (24028) < kill res=-1(EPERM)
  • 23.
    back to thatdd again… # sysdig proc.name=not_dd > /dev/null & dd if=/dev/ zero of=/dev/null bs=1k count=1M ; killall sysdig [1] 24070 1048576+0 records in 1048576+0 records out 1073741824 bytes (1.1 GB) copied, 0.981408 s, 1.1 GB/s
  • 24.
    output formatting sam e as filters (mostly) # sysdig -p '%user.name %proc.name %fd.name: %evt.res' evt.failed = true ubuntu cat /etc/shadow: EACCES ubuntu cat /usr/share/locale/en_US.UTF-8/LC_MESSAGES/ libc.mo: ENOENT ubuntu cat /usr/share/locale/en_US.utf8/LC_MESSAGES/ libc.mo: ENOENT ubuntu cat /usr/share/locale/en_US/LC_MESSAGES/ libc.mo: ENOENT
  • 25.
    bottleneck in ahaystack # sysdig -p '%evt.latency.s.%evt.latency.ns %evt.dir %evt.type %fd.name' fd.type contains ip and fd.sport != 22 (...) 0.000000000 >sendto 192.168.1.118:36220->46.28.247.84:80 0.000114365 <sendto 192.168.1.118:36220->46.28.247.84:80 0.000000000 >recvfrom 192.168.1.118:36220->46.28.247.84:80 0.000005090 <recvfrom 192.168.1.118:36220->46.28.247.84:80 0.000000000 >close 192.168.1.118:36220->46.28.247.84:80 0.000001587 <close 192.168.1.118:36220->46.28.247.84:80
  • 26.
    sysdig -w sysdig-r sysdig -r sysdig -r .scap file shit’s on fire, yo capture trace file, restore service analyze trace at your leisure
  • 27.
    lies, damn liesand benchmarks sysdig -w sysdig -r sysdig -r sysdig -r .scap file do a single benchmark run analyze/postprocess lots of ways
  • 28.
    chisels: higher levelof awesome Lua sysdig -cl sysdig -i chisel_name sysdig -c chisel_name [args…]
  • 29.
    chisel all thethings! # sysdig -cl | grep -c ^[a-z] 37 # find /usr/share/sysdig/chisels/ -name '*.lua' | wc -l 42 the extra ones a re utilities to use in ch isels (json, A NSI term ina l, etc.)
  • 30.
    chisels: performance bottlenecksSlowest system calls fileslower Trace slow file I/O netlower Trace slow network I/O proc_exec_time Show process execution time scallslower Trace slow syscalls topscalls Top system calls by number of calls topscalls_time Top system calls by time yu p, a ty po ;)
  • 31.
    chisels: security list_login_shellsList the login shell IDs ! shellshock_detect print shellshock attacks ! spy_users Display interactive user activity power corru pts, absolute power is even more fun
  • 32.
    All right gentlemen, we need some system info lsof, ps, n etstat lsof, ps, netstat with time travel http://draios.com/ps-lsof-netstat-time-travel/
  • 33.
  • 34.
    version 0.1.91 doyou feel lucky? • some syscalls not yet implemented (no args) • it did crash once (fixed immediately though) • PID namespaces ignored • root/privileged user only • one sysdig process at a time way better tha n strace though