SlideShare a Scribd company logo
1 of 38
Download to read offline
SystemTap
Linux 下的万能观测工具
褚霸
核心系统数据库组
chuba@taobao.com
http://yufeng.info
2010/11/18
Agenda
介绍 SystemTap
 
安装和系统要求
 
实践例子
 
参考和杂项
 
结论
SystemTap 是什么?
According to http://sourceware.org/systemtap/
 
SystemTap provides free software (GPL) infrastructure to
  simplify the gathering of information about the running
  Linux system. This assists diagnosis of a performance or
  functional problem. SystemTap eliminates the need for
  the developer to go through the tedious and disruptive
  instrument, recompile, install, and reboot sequence that
  may be otherwise required to collect data.
   
   
   
  观察活体系统最佳工具,前提是你懂得如何观察!
SystemTap 是如何工作的
1. write or choose a script describing what you want to observe
2. stap translates it into a kernel module
3. stap loads the module and communicates with it
 
4. just wait for your data
 
五步走
# stap -uv test.stp
Pass 1: parsed user script and 74 library script(s) using
86868virt/20488res/1792shr kb, in 190usr/20sys/209real ms.
Pass 2: analyzed script: 1 probe(s), 0 function(s), 0 embed(s), 0 global(s) using
87264virt/21148res/1976shr kb, in 10usr/0sys/7real ms.
Pass 3: translated to C into
"/tmp/stapz2iv97/stap_aef621603e006af62084b361e0a0c981_553.c" using
87264virt/21332res/2144shr kb, in 0usr/0sys/0real ms.
Pass 4: compiled C into "stap_aef621603e006af62084b361e0a0c981_553.ko" in
1230usr/160sys/1384real ms.
Pass 5: starting run.
Pass 5: run completed in 10usr/20sys/12331real ms.
SystemTap  探测点例子
SystemTap is all about executing certain actions when hitting
certain probe points.
 
• syscall.read
      when entering read() system call
• syscall.close.return
        when returning from the close() system call
• module("floppy").function("*")
        when entering any function from the "floppy" module
• kernel.function("*@net/socket.c").return
        when returning from any function in le net/socket.c
• kernel.statement("*@kernel/sched.c:2917")
        when hitting line 2917 of le kernel/sched.c
更多探测点例子
• timer.ms(200)
       every 200 milliseconds
 
• process("/bin/ls").function("*")        
       when entering any function in /bin/ls (not its libraries orsyscalls)
 
• process("/lib/libc.so.6").function("*malloc*")
       when entering any glibc function which has "malloc" in its name
 
• kernel.function("*exit*").return
       when returning from any kernel function which has "exit" in its 
name
  
 
RTFM for more (man stapprobes).
SystemTap 编程语言
• mostly C-style syntax with a feeling of awk
 
• builtin associative arrays
 
• builtin aggregates of statistical data
     very easy to collect data and do statistics on it (average, min,
     max, count,. . . )
 
• many helper functions (builtin and in tapsets)
 
 
RTFM: SystemTap Language Reference shipped with SystemTap
(langref.pdf)
Performances and safety
• language-level safety features
o no pointers
o no unbounded loops
o type inference
o you can also write probe handlers in C (with -g) but don't complain if you 
break stuff 
• runtime safety features
o stap enforces maximum run time for each probe handler
o various concurrency constraints are enforced
o overload processing (don't allow stap to take up all the CPU time)
o many things can be overriden manually if you really want
o see SAFETY AND SECURITY section of stap(1)
 
The overhead depends a lot of what you are trying to do but in 
general stap will try to stop you from doing something stupid (but
then you can still force it to do it).
 
Some helper functions you'll see a lot
pid() which process is this?
uid() which user is running this?
execname() what is the name of this process?
tid() which thread is this?
gettimeofday_s() epoch time in seconds
probefunc() what function are we in?
print_backtrace() figure out how we ended up here
There are many many more. RTFM (man stapfuncs) and explore
/usr/share/systemtap/tapset/.
Some cool stap options
-x trace only speci ed PID (only for userland probing)
-c run given command and only trace it and its children
(will still trace all threads for kernel probes)
-L list probe points matching given pattern along with
available variables
-d load given module debuginfo to help with symbol resolution in
backtraces
-g embed C code in stap script unsafe, dangerous and fun
Agenda
介绍 SystemTap
安装和系统要求
实践例子
参考
结论
Requirements
• SystemTap 探测用户空间程序需要 utrace 的支持,但是这个
特性还没有被 Linux 上游吸收。 Redhat 的发行版本目前支持
这个特性。
• 源码级别跟踪需要安装符号信息
包层面需要安装 package-debuginfo on RPM distros
用户自己的程序需要 gcc -g -gdwarf-2 -g3 编译
• stap 脚本是编译成内核模块运行的,需要 root 权限
安装 SystemTap
RHEL5U4 需要安装内核符号信息 :
rpm -i kernel-debuginfo-common-2.6.18-164.el5.x86_64.rpm
rpm -i kernel-debuginfo-2.6.18-164.el5.x86_64.rpm
由于 5U4 带的 SystemTap 是 0.97 版本,需要升级到 1.3:
./configure prefix=/usr && make && make install
如何验证是否成功:
# stap topsys.stp
                  SYSCALL      COUNT
                    read         48
                    fcntl         42
                     ...
                    fstat          1
--------------------------------------------------------------
Agenda
介绍 SystemTap
安装和系统要求
实践例子
参考和杂项
结论
Example: 谁在执行我们的程序
Listing: exec.stp
probe syscall.exec*{
printf("exec %s %sn", execname(), argstr)
}
$ stap -L 'syscall.exec*'
syscall.execve name:string filename:string args:string argstr:string
$filename:char* $argv:char** $envp:char** $regs:struct pt_regs*
 
# stap exec.stp
exec sshd /usr/sbin/sshd "-R"
exec sshd /bin/bash
例子 : 谁杀了我的程序
Listing: sigkill.stp
probe signal.send{
if(sig_name == "SIGKILL")
printf("%s was sent to %s (pid:%d) by %s uid :%dn", sig_name,
pid_name , sig_pid, execname(), uid())
}
# kill -9 `pgrep top`
 
# stap sigkill.stp
SIGKILL was sent to top (pid:19281) by bash uid :50920
Example tac.c: 工具函数
#include <stdio.h>
#include <stddef.h>
#include <string.h>
char* haha = "wahahan";
char* read_line(FILE* fp, char* buf, size_t len){  return fgets(buf, len,
fp);}
char* reverse_line(char* line, size_t l){
  char *s = line, *e = s + l - sizeof("n"), t;
  while(s < e) {  t =*s, *s = *e, *e = t; s++, e--; }
  return line;
}
void write_line(char* line){ fputs(line, stdout);}
 
Example tac.c continued : 主程序
int main(int argc, char * argv[]){
  char buf[4096], *line;
  FILE* fp = stdin;
  if(argc != 1 ) {fp = fopen(argv[1], "r");}
  if(fp == NULL){fprintf(stdout, "usage: %s filenamen", argv[0]);return
-1;}
  while((line = read_line(fp, buf, sizeof(buf)))){
    line = reverse_line(line, strlen(line));
    write_line(line);
  }
  if(argc != 1) fclose(fp);
  return 0;
}
编译 tac
# 必须要带调试信息
# gcc -g -gdwarf-2 -g3 tac.c
# 确认符号信息的存在 
# stap -L 'process("a.out").function("*")'
process("/tmp/a.out").function("main@/tmp/tac.c:25") $argc:int $argv:char**
$buf:char[] $line:char* $fp:FILE*
process("/tmp/a.out").function("read_line@/tmp/tac.c:7") $fp:FILE* $buf:char*
$len:size_t
process("/tmp/a.out").function("reverse_line@/tmp/tac.c:11") $line:char* $l:size_t
$s:char* $e:char* $t:char
process("/tmp/a.out").function("write_line@/tmp/tac.c:21") $line:char*
Example 1: 读出程序的参数
function get_argv_1:long(argv:long) %{ /* pure */
  THIS->__retvalue =(long) ((char**)THIS->argv)[1];
%}
 
probe process("a.out").function("main"){
  filename = "stdin";
  if($argc > 1) {
    filename = user_string(get_argv_1($argv));
  }
  println(filename);
}
Example 1 continued:
# echo "hi"|./a.out
# ./a.out tac.c
 
 
# stap -gu ./ex1.stp
:)
stdin
tac.c
Example 2: callgraph for anything
function trace(entry_p, extra) {
  %( $# > 1 %? if (tid() in trace) %)
  printf("%s%s%s %sn",
         thread_indent (entry_p),
         (entry_p>0?"->":"<-"),
         probefunc (),
         extra)
}
 
probe $1.call   { trace(1, $$parms) }
probe $1.return { trace(-1, $$return) }
Example 2 continued:
# echo "hi"|./a.out
# sudo stap ./ex2.stp 'process("a.out").function("*")'
:)
     0 a.out(18123):->main argc=0x1 argv=0x7fff351ee0c8
    30 a.out(18123): ->readline fp=0x3f7bb516a0 buf=0x7fff351ecfd0 len=0x1000
   590 a.out(18123): <-readline return=0x7fff351ecfd0
   611 a.out(18123): ->reverse_line line=0x7fff351ecfd0 l=0x3
   625 a.out(18123): <-reverse_line return=0x7fff351ecfd0
   642 a.out(18123): ->write_line line=0x7fff351ecfd0
   731 a.out(18123): <-write_line
   748 a.out(18123): ->readline fp=0x3f7bb516a0 buf=0x7fff351ecfd0 len=0x1000
   762 a.out(18123): <-readline return=0x0
   770 a.out(18123):<-main return=0x0
Example 3: 获取行长度
global line_len
probe process("a.out").statement("reverse_line@tac.c+1"){
line_len <<< ($e - $s + 2);
}
probe end{
if(@count(line_len) >0) print(@hist_linear(line_len, 8, 128, 8));
}
Example 3 continued:
# ls -al|./a.out
# ./ex3.stp
:)
value |-------------------------------------------------- count
   <8 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@                      64
     8 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@                 69
   16 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@                 68
   24 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@                 68
   32 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@                 68
   40 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@                 68
   48 |@@@@@@@@@@@@@@@@@@@@@@@@@                                         50
   56 |                                                                                                            0
   64 |                                                                                                            0
Example 4: 行反转平均时间
global t, call_time
 
probe process("a.out").function("reverse_line"){
  t = gettimeofday_ns()
}
 
probe process("a.out").function("reverse_line").return{
  call_time <<< (gettimeofday_ns() - t)
}
 
probe end{
  if(@count(call_time) > 0) printf("avg reverse_line execute time: %d
nsn", @avg(call_time))
}
Example 4 continued :
# ls -al|./a.out
# ./ex4.stp
:)
avg reverse_line execute time: 6651 ns
Example 5: 列出调用栈
probe process(@1).function(@2){
  print_ubacktrace();
  exit();
}
Example 5 continued:
# ls -al|./a.out
# stap ./ex5.stp './a.out' '*_line'
:)
 0x40066d : reverse_line+0xc/0x61 [a.out]
 0x40078f : main+0xaf/0x100 [a.out]
 0x3bd441d994 [libc-2.5.so+0x1d994/0x357000]
Example 6: 修改程序的行为
global line
function alert_line(line:long) %{ /* pure */
  strcpy((char*)THIS->line, "abcdefgn");
%}
 
probe process("a.out").function("reverse_line"){
  line = user_string($line);
}
 
probe process("a.out").function("reverse_line").return{
  if(isinstr(line, "tac")) $return = $haha;
  else if (isinstr(line, "hello")) alert_line($return);
}
Example 6 continued:
# stap ./ex6.stp
 
# echo tac|./a.out
wahaha
# echo hello|./a.out
abcdefg
# echo world|./a.out
dlrow
Agenda
介绍 SystemTap
安装和系统要求
实践例子
参考和杂项
结论
Emacs Systemtap mode
• 在这里下载 systemtap-mode.el:
http://coderepos.org/share/browser/lang/elisp/systemtap-
mode/systemtap-mode.el?format=txt
• 在 .emacs 里面添加以下二行:
o (autoload 'systemtap-mode "systemtap-mode")
o (add-to-list 'auto-mode-alist '(".stp$" . systemtap-mode))
参考文献
http://sourceware.org/systemtap/langref/
http://sourceware.org/systemtap/tapsets/
http://baike.corp.taobao.com/images/d/df/Systemtap-
haxogreen-2010072301.pdf
http://sourceware.org/systemtap/wiki/AddingUserSpaceProbing
ToApps
http://github.com/posulliv/stap
http://www.slideshare.net/posullivan/monitoring-mysql-with-
dtracesystemtap
Agenda
介绍 SystemTap
安装和系统要求
实践例子
参考和杂项
结论
结论
SystemTap is often described as "DTrace for Linux".
OProfile takes sample every $N CPU cycles so you can try to fi gureout
what each CPU is spending its time on.
SystemTap ,居家必备!!!
谢谢大家!
Any question ?

More Related Content

What's hot

The Linux Kernel Implementation of Pipes and FIFOs
The Linux Kernel Implementation of Pipes and FIFOsThe Linux Kernel Implementation of Pipes and FIFOs
The Linux Kernel Implementation of Pipes and FIFOsDivye Kapoor
 
X / DRM (Direct Rendering Manager) Architectural Overview
X / DRM (Direct Rendering Manager) Architectural OverviewX / DRM (Direct Rendering Manager) Architectural Overview
X / DRM (Direct Rendering Manager) Architectural OverviewMoriyoshi Koizumi
 
BPF / XDP 8월 세미나 KossLab
BPF / XDP 8월 세미나 KossLabBPF / XDP 8월 세미나 KossLab
BPF / XDP 8월 세미나 KossLabTaeung Song
 
Building Mini Embedded Linux System for X86 Arch
Building Mini Embedded Linux System for X86 ArchBuilding Mini Embedded Linux System for X86 Arch
Building Mini Embedded Linux System for X86 ArchSherif Mousa
 
Browsing Linux Kernel Source
Browsing Linux Kernel SourceBrowsing Linux Kernel Source
Browsing Linux Kernel SourceMotaz Saad
 
Linux System Monitoring
Linux System Monitoring Linux System Monitoring
Linux System Monitoring PriyaTeli
 
Basic commands of linux
Basic commands of linuxBasic commands of linux
Basic commands of linuxshravan saini
 
0から始めるコンテナの学び方(Kubernetes Novice Tokyo #14 発表資料)
0から始めるコンテナの学び方(Kubernetes Novice Tokyo #14 発表資料)0から始めるコンテナの学び方(Kubernetes Novice Tokyo #14 発表資料)
0から始めるコンテナの学び方(Kubernetes Novice Tokyo #14 発表資料)NTT DATA Technology & Innovation
 
Linux presentation
Linux presentationLinux presentation
Linux presentationNikhil Jain
 
Linux SMEP bypass techniques
Linux SMEP bypass techniquesLinux SMEP bypass techniques
Linux SMEP bypass techniquesVitaly Nikolenko
 
10分でわかる Cilium と XDP / BPF
10分でわかる Cilium と XDP / BPF10分でわかる Cilium と XDP / BPF
10分でわかる Cilium と XDP / BPFShuji Yamada
 
Android™組込み開発基礎コース BeagleBoard編
Android™組込み開発基礎コース BeagleBoard編Android™組込み開発基礎コース BeagleBoard編
Android™組込み開発基礎コース BeagleBoard編OESF Education
 
Accelerating Envoy and Istio with Cilium and the Linux Kernel
Accelerating Envoy and Istio with Cilium and the Linux KernelAccelerating Envoy and Istio with Cilium and the Linux Kernel
Accelerating Envoy and Istio with Cilium and the Linux KernelThomas Graf
 

What's hot (20)

The Linux Kernel Implementation of Pipes and FIFOs
The Linux Kernel Implementation of Pipes and FIFOsThe Linux Kernel Implementation of Pipes and FIFOs
The Linux Kernel Implementation of Pipes and FIFOs
 
X / DRM (Direct Rendering Manager) Architectural Overview
X / DRM (Direct Rendering Manager) Architectural OverviewX / DRM (Direct Rendering Manager) Architectural Overview
X / DRM (Direct Rendering Manager) Architectural Overview
 
BPF / XDP 8월 세미나 KossLab
BPF / XDP 8월 세미나 KossLabBPF / XDP 8월 세미나 KossLab
BPF / XDP 8월 세미나 KossLab
 
How To Build Android for ARM Chip boards
How To Build Android for ARM Chip boardsHow To Build Android for ARM Chip boards
How To Build Android for ARM Chip boards
 
Building Mini Embedded Linux System for X86 Arch
Building Mini Embedded Linux System for X86 ArchBuilding Mini Embedded Linux System for X86 Arch
Building Mini Embedded Linux System for X86 Arch
 
Browsing Linux Kernel Source
Browsing Linux Kernel SourceBrowsing Linux Kernel Source
Browsing Linux Kernel Source
 
Linux System Monitoring
Linux System Monitoring Linux System Monitoring
Linux System Monitoring
 
Basic commands of linux
Basic commands of linuxBasic commands of linux
Basic commands of linux
 
Linux
LinuxLinux
Linux
 
0から始めるコンテナの学び方(Kubernetes Novice Tokyo #14 発表資料)
0から始めるコンテナの学び方(Kubernetes Novice Tokyo #14 発表資料)0から始めるコンテナの学び方(Kubernetes Novice Tokyo #14 発表資料)
0から始めるコンテナの学び方(Kubernetes Novice Tokyo #14 発表資料)
 
Linux presentation
Linux presentationLinux presentation
Linux presentation
 
Ansible
AnsibleAnsible
Ansible
 
淺談探索 Linux 系統設計之道
淺談探索 Linux 系統設計之道 淺談探索 Linux 系統設計之道
淺談探索 Linux 系統設計之道
 
Linux SMEP bypass techniques
Linux SMEP bypass techniquesLinux SMEP bypass techniques
Linux SMEP bypass techniques
 
10分でわかる Cilium と XDP / BPF
10分でわかる Cilium と XDP / BPF10分でわかる Cilium と XDP / BPF
10分でわかる Cilium と XDP / BPF
 
Cron
CronCron
Cron
 
Android™組込み開発基礎コース BeagleBoard編
Android™組込み開発基礎コース BeagleBoard編Android™組込み開発基礎コース BeagleBoard編
Android™組込み開発基礎コース BeagleBoard編
 
Introduction to docker
Introduction to dockerIntroduction to docker
Introduction to docker
 
Accelerating Envoy and Istio with Cilium and the Linux Kernel
Accelerating Envoy and Istio with Cilium and the Linux KernelAccelerating Envoy and Istio with Cilium and the Linux Kernel
Accelerating Envoy and Istio with Cilium and the Linux Kernel
 
Lvm advanced topics
Lvm advanced topicsLvm advanced topics
Lvm advanced topics
 

Viewers also liked

低成本和高性能MySQL云架构探索
低成本和高性能MySQL云架构探索低成本和高性能MySQL云架构探索
低成本和高性能MySQL云架构探索Feng Yu
 
mnesia脑裂问题综述
mnesia脑裂问题综述mnesia脑裂问题综述
mnesia脑裂问题综述Feng Yu
 
Cpu高效编程技术
Cpu高效编程技术Cpu高效编程技术
Cpu高效编程技术Feng Yu
 
Erlang分布式系统的的领域语言
Erlang分布式系统的的领域语言Erlang分布式系统的的领域语言
Erlang分布式系统的的领域语言Feng Yu
 
了解网络
了解网络了解网络
了解网络Feng Yu
 
Erlang开发实践
Erlang开发实践Erlang开发实践
Erlang开发实践Feng Yu
 
了解集群
了解集群了解集群
了解集群Feng Yu
 
MySQL和IO(上)
MySQL和IO(上)MySQL和IO(上)
MySQL和IO(上)Feng Yu
 
利用新硬件提升数据库性能
利用新硬件提升数据库性能利用新硬件提升数据库性能
利用新硬件提升数据库性能Feng Yu
 
了解内存
了解内存了解内存
了解内存Feng Yu
 
Flash存储设备在淘宝的应用实践
Flash存储设备在淘宝的应用实践Flash存储设备在淘宝的应用实践
Flash存储设备在淘宝的应用实践Feng Yu
 
SSD在淘宝的应用实践
SSD在淘宝的应用实践SSD在淘宝的应用实践
SSD在淘宝的应用实践Feng Yu
 
MySQL和IO(下)
MySQL和IO(下)MySQL和IO(下)
MySQL和IO(下)Feng Yu
 
了解IO协议栈
了解IO协议栈了解IO协议栈
了解IO协议栈Feng Yu
 
了解IO设备
了解IO设备了解IO设备
了解IO设备Feng Yu
 
了解Cpu
了解Cpu了解Cpu
了解CpuFeng Yu
 
Kernel Recipes 2015: Kernel packet capture technologies
Kernel Recipes 2015: Kernel packet capture technologiesKernel Recipes 2015: Kernel packet capture technologies
Kernel Recipes 2015: Kernel packet capture technologiesAnne Nicolas
 

Viewers also liked (20)

低成本和高性能MySQL云架构探索
低成本和高性能MySQL云架构探索低成本和高性能MySQL云架构探索
低成本和高性能MySQL云架构探索
 
mnesia脑裂问题综述
mnesia脑裂问题综述mnesia脑裂问题综述
mnesia脑裂问题综述
 
Cpu高效编程技术
Cpu高效编程技术Cpu高效编程技术
Cpu高效编程技术
 
Erlang分布式系统的的领域语言
Erlang分布式系统的的领域语言Erlang分布式系统的的领域语言
Erlang分布式系统的的领域语言
 
了解网络
了解网络了解网络
了解网络
 
Erlang开发实践
Erlang开发实践Erlang开发实践
Erlang开发实践
 
了解集群
了解集群了解集群
了解集群
 
MySQL和IO(上)
MySQL和IO(上)MySQL和IO(上)
MySQL和IO(上)
 
利用新硬件提升数据库性能
利用新硬件提升数据库性能利用新硬件提升数据库性能
利用新硬件提升数据库性能
 
了解内存
了解内存了解内存
了解内存
 
Flash存储设备在淘宝的应用实践
Flash存储设备在淘宝的应用实践Flash存储设备在淘宝的应用实践
Flash存储设备在淘宝的应用实践
 
SSD在淘宝的应用实践
SSD在淘宝的应用实践SSD在淘宝的应用实践
SSD在淘宝的应用实践
 
MySQL和IO(下)
MySQL和IO(下)MySQL和IO(下)
MySQL和IO(下)
 
了解IO协议栈
了解IO协议栈了解IO协议栈
了解IO协议栈
 
了解IO设备
了解IO设备了解IO设备
了解IO设备
 
了解Cpu
了解Cpu了解Cpu
了解Cpu
 
SystemTapで何か
SystemTapで何かSystemTapで何か
SystemTapで何か
 
Go Lang
Go LangGo Lang
Go Lang
 
Go
GoGo
Go
 
Kernel Recipes 2015: Kernel packet capture technologies
Kernel Recipes 2015: Kernel packet capture technologiesKernel Recipes 2015: Kernel packet capture technologies
Kernel Recipes 2015: Kernel packet capture technologies
 

Similar to Linux观测工具SystemTap实践案例

Performance Profiling in Rust
Performance Profiling in RustPerformance Profiling in Rust
Performance Profiling in RustInfluxData
 
A little systemtap
A little systemtapA little systemtap
A little systemtapyang bingwu
 
A little systemtap
A little systemtapA little systemtap
A little systemtapyang bingwu
 
Linux Server Deep Dives (DrupalCon Amsterdam)
Linux Server Deep Dives (DrupalCon Amsterdam)Linux Server Deep Dives (DrupalCon Amsterdam)
Linux Server Deep Dives (DrupalCon Amsterdam)Amin Astaneh
 
Virtual platform
Virtual platformVirtual platform
Virtual platformsean chen
 
Performance Analysis Tools for Linux Kernel
Performance Analysis Tools for Linux KernelPerformance Analysis Tools for Linux Kernel
Performance Analysis Tools for Linux Kernellcplcp1
 
Crash_Report_Mechanism_In_Tizen
Crash_Report_Mechanism_In_TizenCrash_Report_Mechanism_In_Tizen
Crash_Report_Mechanism_In_TizenLex Yu
 
eBPF Trace from Kernel to Userspace
eBPF Trace from Kernel to UserspaceeBPF Trace from Kernel to Userspace
eBPF Trace from Kernel to UserspaceSUSE Labs Taipei
 
Monitoring MySQL with DTrace/SystemTap
Monitoring MySQL with DTrace/SystemTapMonitoring MySQL with DTrace/SystemTap
Monitoring MySQL with DTrace/SystemTapPadraig O'Sullivan
 
Debugging Python with gdb
Debugging Python with gdbDebugging Python with gdb
Debugging Python with gdbRoman Podoliaka
 
JavaOne 2015 Java Mixed-Mode Flame Graphs
JavaOne 2015 Java Mixed-Mode Flame GraphsJavaOne 2015 Java Mixed-Mode Flame Graphs
JavaOne 2015 Java Mixed-Mode Flame GraphsBrendan Gregg
 
Linux Tracing Superpowers by Eugene Pirogov
Linux Tracing Superpowers by Eugene PirogovLinux Tracing Superpowers by Eugene Pirogov
Linux Tracing Superpowers by Eugene PirogovPivorak MeetUp
 
Modern Linux Tracing Landscape
Modern Linux Tracing LandscapeModern Linux Tracing Landscape
Modern Linux Tracing LandscapeSasha Goldshtein
 
Open Source Systems Performance
Open Source Systems PerformanceOpen Source Systems Performance
Open Source Systems PerformanceBrendan Gregg
 
Efficient System Monitoring in Cloud Native Environments
Efficient System Monitoring in Cloud Native EnvironmentsEfficient System Monitoring in Cloud Native Environments
Efficient System Monitoring in Cloud Native EnvironmentsGergely Szabó
 
nouka inventry manager
nouka inventry managernouka inventry manager
nouka inventry managerToshiaki Baba
 

Similar to Linux观测工具SystemTap实践案例 (20)

Performance Profiling in Rust
Performance Profiling in RustPerformance Profiling in Rust
Performance Profiling in Rust
 
Debug generic process
Debug generic processDebug generic process
Debug generic process
 
Basic Linux kernel
Basic Linux kernelBasic Linux kernel
Basic Linux kernel
 
A little systemtap
A little systemtapA little systemtap
A little systemtap
 
A little systemtap
A little systemtapA little systemtap
A little systemtap
 
Linux Server Deep Dives (DrupalCon Amsterdam)
Linux Server Deep Dives (DrupalCon Amsterdam)Linux Server Deep Dives (DrupalCon Amsterdam)
Linux Server Deep Dives (DrupalCon Amsterdam)
 
Virtual platform
Virtual platformVirtual platform
Virtual platform
 
Performance Analysis Tools for Linux Kernel
Performance Analysis Tools for Linux KernelPerformance Analysis Tools for Linux Kernel
Performance Analysis Tools for Linux Kernel
 
BioMake BOSC 2004
BioMake BOSC 2004BioMake BOSC 2004
BioMake BOSC 2004
 
Crash_Report_Mechanism_In_Tizen
Crash_Report_Mechanism_In_TizenCrash_Report_Mechanism_In_Tizen
Crash_Report_Mechanism_In_Tizen
 
eBPF Trace from Kernel to Userspace
eBPF Trace from Kernel to UserspaceeBPF Trace from Kernel to Userspace
eBPF Trace from Kernel to Userspace
 
Monitoring MySQL with DTrace/SystemTap
Monitoring MySQL with DTrace/SystemTapMonitoring MySQL with DTrace/SystemTap
Monitoring MySQL with DTrace/SystemTap
 
Debugging Python with gdb
Debugging Python with gdbDebugging Python with gdb
Debugging Python with gdb
 
JavaOne 2015 Java Mixed-Mode Flame Graphs
JavaOne 2015 Java Mixed-Mode Flame GraphsJavaOne 2015 Java Mixed-Mode Flame Graphs
JavaOne 2015 Java Mixed-Mode Flame Graphs
 
Linux Tracing Superpowers by Eugene Pirogov
Linux Tracing Superpowers by Eugene PirogovLinux Tracing Superpowers by Eugene Pirogov
Linux Tracing Superpowers by Eugene Pirogov
 
Modern Linux Tracing Landscape
Modern Linux Tracing LandscapeModern Linux Tracing Landscape
Modern Linux Tracing Landscape
 
Open Source Systems Performance
Open Source Systems PerformanceOpen Source Systems Performance
Open Source Systems Performance
 
Efficient System Monitoring in Cloud Native Environments
Efficient System Monitoring in Cloud Native EnvironmentsEfficient System Monitoring in Cloud Native Environments
Efficient System Monitoring in Cloud Native Environments
 
nouka inventry manager
nouka inventry managernouka inventry manager
nouka inventry manager
 
App container rkt
App container rktApp container rkt
App container rkt
 

More from Feng Yu

了解应用服务器
了解应用服务器了解应用服务器
了解应用服务器Feng Yu
 
我为什么要选择RabbitMQ
我为什么要选择RabbitMQ我为什么要选择RabbitMQ
我为什么要选择RabbitMQFeng Yu
 
Rethink db&tokudb调研测试报告
Rethink db&tokudb调研测试报告Rethink db&tokudb调研测试报告
Rethink db&tokudb调研测试报告Feng Yu
 
高性能集群服务器(Erlang解决方案)
高性能集群服务器(Erlang解决方案)高性能集群服务器(Erlang解决方案)
高性能集群服务器(Erlang解决方案)Feng Yu
 
淘宝商品库MySQL优化实践
淘宝商品库MySQL优化实践淘宝商品库MySQL优化实践
淘宝商品库MySQL优化实践Feng Yu
 
开源混合存储方案(Flashcache)
开源混合存储方案(Flashcache)开源混合存储方案(Flashcache)
开源混合存储方案(Flashcache)Feng Yu
 
Erlang low cost_clound_computing
Erlang low cost_clound_computingErlang low cost_clound_computing
Erlang low cost_clound_computingFeng Yu
 
Oprofile linux
Oprofile linuxOprofile linux
Oprofile linuxFeng Yu
 
C1000K高性能服务器构建技术
C1000K高性能服务器构建技术C1000K高性能服务器构建技术
C1000K高性能服务器构建技术Feng Yu
 
Erlang全接触
Erlang全接触Erlang全接触
Erlang全接触Feng Yu
 
Tsung 压力测试工具
Tsung 压力测试工具Tsung 压力测试工具
Tsung 压力测试工具Feng Yu
 
Inside Erlang Vm II
Inside Erlang Vm IIInside Erlang Vm II
Inside Erlang Vm IIFeng Yu
 

More from Feng Yu (12)

了解应用服务器
了解应用服务器了解应用服务器
了解应用服务器
 
我为什么要选择RabbitMQ
我为什么要选择RabbitMQ我为什么要选择RabbitMQ
我为什么要选择RabbitMQ
 
Rethink db&tokudb调研测试报告
Rethink db&tokudb调研测试报告Rethink db&tokudb调研测试报告
Rethink db&tokudb调研测试报告
 
高性能集群服务器(Erlang解决方案)
高性能集群服务器(Erlang解决方案)高性能集群服务器(Erlang解决方案)
高性能集群服务器(Erlang解决方案)
 
淘宝商品库MySQL优化实践
淘宝商品库MySQL优化实践淘宝商品库MySQL优化实践
淘宝商品库MySQL优化实践
 
开源混合存储方案(Flashcache)
开源混合存储方案(Flashcache)开源混合存储方案(Flashcache)
开源混合存储方案(Flashcache)
 
Erlang low cost_clound_computing
Erlang low cost_clound_computingErlang low cost_clound_computing
Erlang low cost_clound_computing
 
Oprofile linux
Oprofile linuxOprofile linux
Oprofile linux
 
C1000K高性能服务器构建技术
C1000K高性能服务器构建技术C1000K高性能服务器构建技术
C1000K高性能服务器构建技术
 
Erlang全接触
Erlang全接触Erlang全接触
Erlang全接触
 
Tsung 压力测试工具
Tsung 压力测试工具Tsung 压力测试工具
Tsung 压力测试工具
 
Inside Erlang Vm II
Inside Erlang Vm IIInside Erlang Vm II
Inside Erlang Vm II
 

Linux观测工具SystemTap实践案例

  • 3. SystemTap 是什么? According to http://sourceware.org/systemtap/   SystemTap provides free software (GPL) infrastructure to   simplify the gathering of information about the running   Linux system. This assists diagnosis of a performance or   functional problem. SystemTap eliminates the need for   the developer to go through the tedious and disruptive   instrument, recompile, install, and reboot sequence that   may be otherwise required to collect data.               观察活体系统最佳工具,前提是你懂得如何观察!
  • 4. SystemTap 是如何工作的 1. write or choose a script describing what you want to observe 2. stap translates it into a kernel module 3. stap loads the module and communicates with it   4. just wait for your data  
  • 5. 五步走 # stap -uv test.stp Pass 1: parsed user script and 74 library script(s) using 86868virt/20488res/1792shr kb, in 190usr/20sys/209real ms. Pass 2: analyzed script: 1 probe(s), 0 function(s), 0 embed(s), 0 global(s) using 87264virt/21148res/1976shr kb, in 10usr/0sys/7real ms. Pass 3: translated to C into "/tmp/stapz2iv97/stap_aef621603e006af62084b361e0a0c981_553.c" using 87264virt/21332res/2144shr kb, in 0usr/0sys/0real ms. Pass 4: compiled C into "stap_aef621603e006af62084b361e0a0c981_553.ko" in 1230usr/160sys/1384real ms. Pass 5: starting run. Pass 5: run completed in 10usr/20sys/12331real ms.
  • 6. SystemTap  探测点例子 SystemTap is all about executing certain actions when hitting certain probe points.   • syscall.read       when entering read() system call • syscall.close.return         when returning from the close() system call • module("floppy").function("*")         when entering any function from the "floppy" module • kernel.function("*@net/socket.c").return         when returning from any function in le net/socket.c • kernel.statement("*@kernel/sched.c:2917")         when hitting line 2917 of le kernel/sched.c
  • 7. 更多探测点例子 • timer.ms(200)        every 200 milliseconds   • process("/bin/ls").function("*")                when entering any function in /bin/ls (not its libraries orsyscalls)   • process("/lib/libc.so.6").function("*malloc*")        when entering any glibc function which has "malloc" in its name   • kernel.function("*exit*").return        when returning from any kernel function which has "exit" in its  name      RTFM for more (man stapprobes).
  • 8. SystemTap 编程语言 • mostly C-style syntax with a feeling of awk   • builtin associative arrays   • builtin aggregates of statistical data      very easy to collect data and do statistics on it (average, min,      max, count,. . . )   • many helper functions (builtin and in tapsets)     RTFM: SystemTap Language Reference shipped with SystemTap (langref.pdf)
  • 9. Performances and safety • language-level safety features o no pointers o no unbounded loops o type inference o you can also write probe handlers in C (with -g) but don't complain if you  break stuff  • runtime safety features o stap enforces maximum run time for each probe handler o various concurrency constraints are enforced o overload processing (don't allow stap to take up all the CPU time) o many things can be overriden manually if you really want o see SAFETY AND SECURITY section of stap(1)   The overhead depends a lot of what you are trying to do but in  general stap will try to stop you from doing something stupid (but then you can still force it to do it).  
  • 10. Some helper functions you'll see a lot pid() which process is this? uid() which user is running this? execname() what is the name of this process? tid() which thread is this? gettimeofday_s() epoch time in seconds probefunc() what function are we in? print_backtrace() figure out how we ended up here There are many many more. RTFM (man stapfuncs) and explore /usr/share/systemtap/tapset/.
  • 11. Some cool stap options -x trace only speci ed PID (only for userland probing) -c run given command and only trace it and its children (will still trace all threads for kernel probes) -L list probe points matching given pattern along with available variables -d load given module debuginfo to help with symbol resolution in backtraces -g embed C code in stap script unsafe, dangerous and fun
  • 13. Requirements • SystemTap 探测用户空间程序需要 utrace 的支持,但是这个 特性还没有被 Linux 上游吸收。 Redhat 的发行版本目前支持 这个特性。 • 源码级别跟踪需要安装符号信息 包层面需要安装 package-debuginfo on RPM distros 用户自己的程序需要 gcc -g -gdwarf-2 -g3 编译 • stap 脚本是编译成内核模块运行的,需要 root 权限
  • 14. 安装 SystemTap RHEL5U4 需要安装内核符号信息 : rpm -i kernel-debuginfo-common-2.6.18-164.el5.x86_64.rpm rpm -i kernel-debuginfo-2.6.18-164.el5.x86_64.rpm 由于 5U4 带的 SystemTap 是 0.97 版本,需要升级到 1.3: ./configure prefix=/usr && make && make install 如何验证是否成功: # stap topsys.stp                   SYSCALL      COUNT                     read         48                     fcntl         42                      ...                     fstat          1 --------------------------------------------------------------
  • 16. Example: 谁在执行我们的程序 Listing: exec.stp probe syscall.exec*{ printf("exec %s %sn", execname(), argstr) } $ stap -L 'syscall.exec*' syscall.execve name:string filename:string args:string argstr:string $filename:char* $argv:char** $envp:char** $regs:struct pt_regs*   # stap exec.stp exec sshd /usr/sbin/sshd "-R" exec sshd /bin/bash
  • 17. 例子 : 谁杀了我的程序 Listing: sigkill.stp probe signal.send{ if(sig_name == "SIGKILL") printf("%s was sent to %s (pid:%d) by %s uid :%dn", sig_name, pid_name , sig_pid, execname(), uid()) } # kill -9 `pgrep top`   # stap sigkill.stp SIGKILL was sent to top (pid:19281) by bash uid :50920
  • 18. Example tac.c: 工具函数 #include <stdio.h> #include <stddef.h> #include <string.h> char* haha = "wahahan"; char* read_line(FILE* fp, char* buf, size_t len){  return fgets(buf, len, fp);} char* reverse_line(char* line, size_t l){   char *s = line, *e = s + l - sizeof("n"), t;   while(s < e) {  t =*s, *s = *e, *e = t; s++, e--; }   return line; } void write_line(char* line){ fputs(line, stdout);}  
  • 19. Example tac.c continued : 主程序 int main(int argc, char * argv[]){   char buf[4096], *line;   FILE* fp = stdin;   if(argc != 1 ) {fp = fopen(argv[1], "r");}   if(fp == NULL){fprintf(stdout, "usage: %s filenamen", argv[0]);return -1;}   while((line = read_line(fp, buf, sizeof(buf)))){     line = reverse_line(line, strlen(line));     write_line(line);   }   if(argc != 1) fclose(fp);   return 0; }
  • 20. 编译 tac # 必须要带调试信息 # gcc -g -gdwarf-2 -g3 tac.c # 确认符号信息的存在  # stap -L 'process("a.out").function("*")' process("/tmp/a.out").function("main@/tmp/tac.c:25") $argc:int $argv:char** $buf:char[] $line:char* $fp:FILE* process("/tmp/a.out").function("read_line@/tmp/tac.c:7") $fp:FILE* $buf:char* $len:size_t process("/tmp/a.out").function("reverse_line@/tmp/tac.c:11") $line:char* $l:size_t $s:char* $e:char* $t:char process("/tmp/a.out").function("write_line@/tmp/tac.c:21") $line:char*
  • 21. Example 1: 读出程序的参数 function get_argv_1:long(argv:long) %{ /* pure */   THIS->__retvalue =(long) ((char**)THIS->argv)[1]; %}   probe process("a.out").function("main"){   filename = "stdin";   if($argc > 1) {     filename = user_string(get_argv_1($argv));   }   println(filename); }
  • 22. Example 1 continued: # echo "hi"|./a.out # ./a.out tac.c     # stap -gu ./ex1.stp :) stdin tac.c
  • 23. Example 2: callgraph for anything function trace(entry_p, extra) {   %( $# > 1 %? if (tid() in trace) %)   printf("%s%s%s %sn",          thread_indent (entry_p),          (entry_p>0?"->":"<-"),          probefunc (),          extra) }   probe $1.call   { trace(1, $$parms) } probe $1.return { trace(-1, $$return) }
  • 24. Example 2 continued: # echo "hi"|./a.out # sudo stap ./ex2.stp 'process("a.out").function("*")' :)      0 a.out(18123):->main argc=0x1 argv=0x7fff351ee0c8     30 a.out(18123): ->readline fp=0x3f7bb516a0 buf=0x7fff351ecfd0 len=0x1000    590 a.out(18123): <-readline return=0x7fff351ecfd0    611 a.out(18123): ->reverse_line line=0x7fff351ecfd0 l=0x3    625 a.out(18123): <-reverse_line return=0x7fff351ecfd0    642 a.out(18123): ->write_line line=0x7fff351ecfd0    731 a.out(18123): <-write_line    748 a.out(18123): ->readline fp=0x3f7bb516a0 buf=0x7fff351ecfd0 len=0x1000    762 a.out(18123): <-readline return=0x0    770 a.out(18123):<-main return=0x0
  • 25. Example 3: 获取行长度 global line_len probe process("a.out").statement("reverse_line@tac.c+1"){ line_len <<< ($e - $s + 2); } probe end{ if(@count(line_len) >0) print(@hist_linear(line_len, 8, 128, 8)); }
  • 26. Example 3 continued: # ls -al|./a.out # ./ex3.stp :) value |-------------------------------------------------- count    <8 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@                      64      8 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@                 69    16 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@                 68    24 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@                 68    32 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@                 68    40 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@                 68    48 |@@@@@@@@@@@@@@@@@@@@@@@@@                                         50    56 |                                                                                                            0    64 |                                                                                                            0
  • 27. Example 4: 行反转平均时间 global t, call_time   probe process("a.out").function("reverse_line"){   t = gettimeofday_ns() }   probe process("a.out").function("reverse_line").return{   call_time <<< (gettimeofday_ns() - t) }   probe end{   if(@count(call_time) > 0) printf("avg reverse_line execute time: %d nsn", @avg(call_time)) }
  • 28. Example 4 continued : # ls -al|./a.out # ./ex4.stp :) avg reverse_line execute time: 6651 ns
  • 29. Example 5: 列出调用栈 probe process(@1).function(@2){   print_ubacktrace();   exit(); }
  • 30. Example 5 continued: # ls -al|./a.out # stap ./ex5.stp './a.out' '*_line' :)  0x40066d : reverse_line+0xc/0x61 [a.out]  0x40078f : main+0xaf/0x100 [a.out]  0x3bd441d994 [libc-2.5.so+0x1d994/0x357000]
  • 31. Example 6: 修改程序的行为 global line function alert_line(line:long) %{ /* pure */   strcpy((char*)THIS->line, "abcdefgn"); %}   probe process("a.out").function("reverse_line"){   line = user_string($line); }   probe process("a.out").function("reverse_line").return{   if(isinstr(line, "tac")) $return = $haha;   else if (isinstr(line, "hello")) alert_line($return); }
  • 32. Example 6 continued: # stap ./ex6.stp   # echo tac|./a.out wahaha # echo hello|./a.out abcdefg # echo world|./a.out dlrow
  • 34. Emacs Systemtap mode • 在这里下载 systemtap-mode.el: http://coderepos.org/share/browser/lang/elisp/systemtap- mode/systemtap-mode.el?format=txt • 在 .emacs 里面添加以下二行: o (autoload 'systemtap-mode "systemtap-mode") o (add-to-list 'auto-mode-alist '(".stp$" . systemtap-mode))
  • 37. 结论 SystemTap is often described as "DTrace for Linux". OProfile takes sample every $N CPU cycles so you can try to fi gureout what each CPU is spending its time on. SystemTap ,居家必备!!!