SlideShare a Scribd company logo
1 of 66
Download to read offline
FreeBSD Developer and Vendor Summit, Nov, 2014 
Flame 
Graphs 
on 
FreeBSD 
Brendan 
Gregg 
Senior 
Performance 
Architect 
Performance 
Engineering 
Team 
bgregg@ne5lix.com 
@brendangregg
Agenda 
1. Genesis 
2. Genera=on 
3. CPU 
4. Memory 
5. Disk 
I/O 
6. Off-­‐CPU 
7. Chain
1. 
Genesis
The 
Problem 
• The 
same 
MySQL 
load 
on 
one 
host 
runs 
at 
30% 
higher 
CPU 
than 
another. 
Why? 
• CPU 
profiling 
should 
answer 
this 
easily
# dtrace -x ustackframes=100 -n 'profile-997 /execname == "mysqld"/ {! 
@[ustack()] = count(); } tick-60s { exit(0); }'! 
dtrace: description 'profile-997 ' matched 2 probes! 
CPU ID FUNCTION:NAME! 
1 75195 :tick-60s! 
[...]! 
libc.so.1`__priocntlset+0xa! 
libc.so.1`getparam+0x83! 
libc.so.1`pthread_getschedparam+0x3c! 
libc.so.1`pthread_setschedprio+0x1f! 
mysqld`_Z16dispatch_command19enum_server_commandP3THDPcj+0x9ab! 
mysqld`_Z10do_commandP3THD+0x198! 
mysqld`handle_one_connection+0x1a6! 
libc.so.1`_thrp_setup+0x8d! 
libc.so.1`_lwp_start! 
4884! 
! 
mysqld`_Z13add_to_statusP17system_status_varS0_+0x47! 
mysqld`_Z22calc_sum_of_all_statusP17system_status_var+0x67! 
mysqld`_Z16dispatch_command19enum_server_commandP3THDPcj+0x1222! 
mysqld`_Z10do_commandP3THD+0x198! 
mysqld`handle_one_connection+0x1a6! 
libc.so.1`_thrp_setup+0x8d! 
libc.so.1`_lwp_start! 
5530!
# dtrace -x ustackframes=100 -n 'profile-997 /execname == "mysqld"/ {! 
@[ustack()] = count(); } tick-60s { exit(0); }'! 
dtrace: description 'profile-997 ' matched 2 probes! 
CPU ID FUNCTION:NAME! 
1 75195 :tick-60s! 
[...]! 
libc.so.1`__priocntlset+0xa! 
libc.so.1`getparam+0x83! 
libc.so.1`pthread_getschedparam+0x3c! 
libc.so.1`pthread_setschedprio+0x1f! 
mysqld`_Z16dispatch_command19enum_server_commandP3THDPcj+0x9ab! 
mysqld`_Z10do_commandP3THD+0x198! 
mysqld`handle_one_connection+0x1a6! 
libc.so.1`_thrp_setup+0x8d! 
libc.so.1`_lwp_start! 
4884! 
! 
this 
stack 
was 
sampled 
this 
many 
=mes 
mysqld`_Z13add_to_statusP17system_status_varS0_+0x47! 
mysqld`_Z22calc_sum_of_all_statusP17system_status_var+0x67! 
mysqld`_Z16dispatch_command19enum_server_commandP3THDPcj+0x1222! 
mysqld`_Z10do_commandP3THD+0x198! 
mysqld`handle_one_connection+0x1a6! 
libc.so.1`_thrp_setup+0x8d! 
libc.so.1`_lwp_start! 
5530! 
Only 
unique 
stacks 
are 
shown, 
with 
their 
counts. 
This 
compresses 
the 
output.
# dtrace -x ustackframes=100 -n 'profile-997 /execname == "mysqld"/ {! 
@[ustack()] = count(); } tick-60s { exit(0); }'! 
dtrace: description 'profile-997 ' matched 2 probes! 
CPU ID FUNCTION:NAME! 
1 75195 :tick-60s! 
[...]! 
libc.so.1`__priocntlset+0xa! 
libc.so.1`getparam+0x83! 
libc.so.1`pthread_getschedparam+0x3c! 
libc.so.1`pthread_setschedprio+0x1f! 
mysqld`_Z16dispatch_command19enum_server_commandP3THDPcj+0x9ab! 
mysqld`_Z10do_commandP3THD+0x198! 
mysqld`handle_one_connection+0x1a6! 
libc.so.1`_thrp_setup+0x8d! 
libc.so.1`_lwp_start! 
4884! 
! 
This 
stack 
– 
the 
most 
frequent 
– 
is 
<2% 
of 
the 
samples 
mysqld`_Z13add_to_statusP17system_status_varS0_+0x47! 
mysqld`_Z22calc_sum_of_all_statusP17system_status_var+0x67! 
mysqld`_Z16dispatch_command19enum_server_commandP3THDPcj+0x1222! 
mysqld`_Z10do_commandP3THD+0x198! 
mysqld`handle_one_connection+0x1a6! 
libc.so.1`_thrp_setup+0x8d! 
libc.so.1`_lwp_start! 
5530!
# dtrace -x ustackframes=100 -n 'profile-997 /execname == "mysqld"/ {! 
@[ustack()] = count(); } tick-60s { exit(0); }'! 
dtrace: description 'profile-997 ' matched 2 probes! 
CPU ID FUNCTION:NAME! 
1 75195 :tick-60s! 
[...]! 
libc.so.1`__priocntlset+0xa! 
libc.so.1`getparam+0x83! 
libc.so.1`pthread_getschedparam+0x3c! 
libc.so.1`pthread_setschedprio+0x1f! 
mysqld`_Z16dispatch_command19enum_server_commandP3THDPcj+0x9ab! 
mysqld`_Z10do_commandP3THD+0x198! 
mysqld`handle_one_connection+0x1a6! 
libc.so.1`_thrp_setup+0x8d! 
libc.so.1`_lwp_start! 
4884! 
! 
mysqld`_Z13add_to_statusP17system_status_varS0_+0x47! 
mysqld`_Z22calc_sum_of_all_statusP17system_status_var+0x67! 
mysqld`_Z16dispatch_command19enum_server_commandP3THDPcj+0x1222! 
mysqld`_Z10do_commandP3THD+0x198! 
mysqld`handle_one_connection+0x1a6! 
libc.so.1`_thrp_setup+0x8d! 
libc.so.1`_lwp_start! 
5530! 
Despite 
the 
terse 
output, 
I 
elided 
over 
500,000 
lines 
Here 
is 
what 
all 
the 
output 
looks 
like…
Size 
of 
one 
stack 
Last 
two 
stacks
These 
are 
just 
the 
unique 
stacks. 
I 
have 
to 
compare 
this 
grey 
featureless 
square, 
with 
a 
grey 
square 
from 
the 
other 
host, 
and 
explain 
the 
30% 
CPU 
difference. 
And 
I 
need 
to 
do 
this 
by 
Friday.
Flame 
Graph 
of 
the 
same 
data
Flame 
Graph 
of 
the 
same 
data 
one 
stack 
sample 
stack 
depth 
number 
of 
samples
Problem 
Solved 
• Comparing 
two 
flame 
graphs 
was 
trivial 
– MySQL 
codepath 
difference 
suggested 
different 
compiler 
op=miza=ons, 
which 
was 
confirmed 
• Flame 
graph 
needed 
in 
this 
case 
– Profile 
data 
was 
too 
large 
to 
consume 
otherwise 
– Not 
always 
the 
case: 
the 
profiler 
output 
might 
be 
small 
enough 
to 
read 
directly. 
For 
CPU 
profiles, 
it 
ofen 
isn’t.
Flame 
Graphs: 
Defini=on 
• Boxes: 
are 
func=ons 
– Visualizes 
a 
frame 
of 
a 
stack 
trace 
• y-­‐axis: 
stack 
depth 
– The 
top 
func=on 
led 
to 
the 
profiling 
event, 
everything 
beneath 
it 
is 
ancestry: 
explains 
why 
• x-­‐axis: 
spans 
samples, 
sorted 
alphabe=cally 
– Box 
width 
shows 
sample 
count: 
bigger 
for 
more 
– Alphabe=cal 
sort 
improves 
merging 
of 
like-­‐frames 
• Colors: 
either 
random 
or 
a 
dimension 
– Random 
helps 
separate 
columns
Flame 
Graphs: 
Presenta=on 
• All 
threads 
can 
be 
shown 
in 
the 
same 
graph 
– So 
can 
mul=ple 
distributed 
systems 
• Can 
be 
interac=ve 
– Mouse 
over 
for 
details 
– Click 
to 
zoom 
• Can 
be 
invented 
– Eg, 
Facebook’s 
icicle-­‐like 
flame 
graphs 
• Uses 
for 
color: 
– Differen=als 
– Modes: 
user/library/kernel
2. 
Genera=on
Examples 
• Using 
DTrace 
to 
profile 
kernel 
CPU 
usage: 
# git clone https://github.com/brendangregg/FlameGraph! 
# cd FlameGraph! 
# kldload dtraceall # if needed! 
# dtrace -x stackframes=100 -n 'profile-197 /arg0/ {! 
@[stack()] = count(); } tick-60s { exit(0); }' -o out.stacks! 
# ./stackcollapse.pl out.stacks | ./flamegraph.pl > out.svg! 
• Using 
pmcstat 
to 
profile 
stall 
cycles: 
… ! 
# pmcstat –S RESOURCE_STALLS.ANY -O out.pmcstat sleep 10! 
# pmcstat -R out.pmcstat -z100 -G out.stacks! 
# ./stackcollapse-pmc.pl out.stacks | ./flamegraph.pl > out.svg!
Steps 
1. 
Profile 
Stacks 
2. 
Fold 
Stacks 
3. 
Flame 
Graph
Step 
1. 
Profile 
Stacks 
• FreeBSD 
data 
sources: 
– DTrace 
stack() 
or 
ustack() 
– pmcstat 
-­‐G 
stacks 
– Applica=on 
profilers 
– Anything 
that 
can 
gather 
full 
stacks
Step 
2. 
Fold 
Stacks 
• Profiled 
stacks 
are 
“folded” 
to 
1 
line 
per 
stack: 
func1;func2;func3;… count! 
…! 
• Many 
converters 
exist 
(usually 
Perl). 
Eg: 
Format 
Program 
DTrace 
stackcollapse.pl 
FreeBSD 
pmcstat 
stackcollapse-­‐pmc.pl 
Linux 
perf_events 
stackcollapse-­‐perf.pl 
OS 
X 
Instruments 
stackcollapse-­‐instruments.pl 
Lightweight 
Java 
Profiler 
stackcollapse-­‐ljp.awk
Step 
3. 
Flame 
Graph 
• flamegraph.pl 
converts 
folded 
stacks 
into 
an 
interac=ve 
SVG 
Flame 
Graph 
– Uses 
JavaScript. 
Open 
in 
a 
browser. 
USAGE: ./flamegraph.pl [options] infile > outfile.svg! 
!--title # change title text! 
!--width # width of image (default 1200)! 
!--height # height of each frame (default 16)! 
!--minwidth # omit smaller functions (default 0.1 pixels)! 
!--fonttype # font type (default "Verdana")! 
!--fontsize # font size (default 12)! 
!--countname # count type label (default "samples")! 
!--nametype # name type label (default "Function:")! 
!--colors # "hot", "mem", "io" palette (default "hot")! 
!--hash # colors are keyed by function name hash! 
!--cp # use consistent palette (palette.map)! 
!--reverse # generate stack-reversed flame graph! 
!--inverted # icicle graph! 
!--negate # switch differential hues (blue<->red)!
Extra 
Step: 
Filtering 
• Folded 
stacks 
(single 
line 
records) 
are 
easy 
to 
process 
with 
grep/sed/awk 
• For 
CPU 
profiles, 
I 
commonly 
exclude 
cpu_idle(): 
# ./stackcollapse.pl out.stacks | grep –v cpu_idle | ! 
./flamegraph.pl out.folded > out.svg! 
• Or 
click-­‐to-­‐zoom
3. 
CPU
CPU: 
Stack 
Sampling
cpu-­‐freebsd01.svg
cpu-­‐freebsd02.svg
cpu-­‐freebsd03.svg
DEMO
CPU: 
Commands 
• DTrace 
kernel 
stack 
sampling 
at 
199 
Hertz, 
60 
s: 
# dtrace -x stackframes=100 -n 'profile-199 /arg0/ {! 
@[stack()] = count(); } tick-60s { exit(0); }' -o out.stacks! 
• DTrace 
user 
stack 
sampling 
at 
99 
Hertz, 
60 
s: 
# dtrace -x ustackframes=100 -n 'profile-99 /arg1/ {! 
@[ustack()] = count(); } tick-60s { exit(0); }' -o out.stacks! 
• Warnings: 
– ustack() 
more 
expensive 
– Short-­‐lived 
processes 
will 
miss 
symbol 
transla=on
CPU: 
Commands 
• DTrace 
both 
stack 
sampling 
at 
99 
Hertz, 
30 
s: 
# dtrace -x stackframes=100 -x ustackframes=100 -n '! 
profile-99 { @[stack(), ustack(), execname] = count(); }! 
tick-30s { printa("%k-%k%sn%@dn", @); exit(0); }! 
' -o out.stacks! 
– This 
prints 
kernel 
then 
user 
stacks, 
separated 
by 
a 
“-­‐” 
line. 
flamegraph.pl 
makes 
“-­‐” 
lines 
grey 
(separators) 
• pmcstat 
for 
everything 
beyond 
sampling
4. 
Memory
Memory: 
4 
Targets
vm_faults-­‐kernel01.svg
DEMO
Memory: 
Commands 
• DTrace 
page 
fault 
profiling 
of 
kernel 
stacks, 
30 
s: 
# dtrace –x stackframes=100 -n 'fbt::vm_fault:entry {! 
@[stack()] = count(); } tick-30s { exit(0) }' -o out.stacks! 
• Flame 
graphs 
generated 
with 
-­‐-­‐colors=mem 
# cat out.stacks | ./stackcollapse.pl | ./flamegraph.pl ! 
--title="FreeBSD vm_fault() kernel stacks" --colors=mem ! 
--countname=pages --width=800 > vm_faults-kernel01.svg! 
• See 
earlier 
diagram 
for 
other 
targets
Memory: 
Commands 
• DTrace 
page 
fault 
profiling 
of 
user 
stacks, 
5 
s: 
# dtrace -x ustackframes=100 -n 'fbt::vm_fault:entry {! 
@[ustack(), execname] = count(); } tick-5s { exit(0) }! 
' -o out.stacks! 
• Warnings: 
– Overhead 
for 
user-­‐level 
stack 
transla=on 
rela=ve 
to 
number 
of 
unique 
stacks, 
and 
might 
be 
significant 
– Stacks 
for 
short-­‐lived 
processes 
may 
be 
hex, 
as 
transla=on 
is 
performed 
afer 
the 
process 
has 
exited 
• See 
also 
other 
memory 
target 
types 
(earlier 
pic)
5. 
Disk 
I/O
iostart01.svg
DEMO
Disk 
I/O: 
Commands 
• Device 
I/O 
issued 
with 
kernel 
stacks, 
30 
s: 
# dtrace –x stackframes=100 -n 'io:::start {! 
@[stack()] = count(); } tick-10s { exit(0) }' -o out.stacks! 
• Flame 
graphs 
generated 
with 
-­‐-­‐colors=io 
# cat out.stacks | ./stackcollapse.pl | ./flamegraph.pl ! 
--title="FreeBSD Storage I/O Kernel Flame Graph" --colors=io! 
--countname=io--width=800 > iostart01.svg! 
• Note 
that 
this 
shows 
IOPS; 
would 
be 
beuer 
to 
measure 
and 
show 
latency
6. 
Off-­‐CPU
Off-­‐CPU 
Profiling 
On-­‐CPU 
Profiling 
Thread 
State 
Transi=on 
Diagram 
Off-­‐CPU 
Profiling 
(everything 
else)
off-­‐kernel01.svg
off-­‐both01.svg
DEMO
Off-­‐CPU: 
Commands 
• DTrace 
off-­‐CPU 
=me 
kernel 
stacks, 
10 
s: 
# dtrace -x dynvarsize=8m -x stackframes=100 –n ‘! 
sched:::off-cpu { self->ts = timestamp; }! 
sched:::on-cpu /self->ts/ { @[stack()] =! 
sum(timestamp - self->ts); self->ts = 0; }! 
tick-10s { normalize(@, 1000000); exit(0); }' -o out.stacks! 
• Flame 
graph: 
countname=ms 
• Warning: 
Ofen 
high 
overhead. 
DTrace 
will 
drop: 
dtrace: 886 dynamic variable drops with non-empty dirty list! 
• User-­‐level 
stacks 
more 
interes=ng, 
and 
expensive
Off-­‐CPU: 
Commands 
• DTrace 
off-­‐CPU 
=me 
kernel 
& 
user 
stacks, 
10 
s: 
# dtrace -x dynvarsize=8m -x stackframes=100 -x ustackframes=100 –n '! 
• Beware 
overheads 
• Real 
reason 
for 
blocking 
ofen 
obscured 
– Need 
to 
trace 
the 
wakeups, 
and 
examine 
their 
stacks 
sched:::off-cpu { self->ts = timestamp; } 
sched:::on-cpu /self->ts/ {! 
@[stack(), ustack(), execname] = sum(timestamp - self->ts);! 
self->ts = 0; }! 
tick-10s { normalize(@, 1000000);! 
printa("%k-%k%sn%@dn", @); exit(0); }! 
' -o out.offcpu!
Solve 
ALL 
The 
Issues 
• Tantalizingly 
close 
to 
solving 
all 
perf 
issues: 
On-­‐CPU 
Issues 
Off-­‐CPU 
Issues 
Common 
Common 
Some 
solved 
using 
top 
alone 
Some 
solved 
using 
iostat/systat 
Many 
solved 
using 
CPU 
(Sample) 
Flame 
Graphs 
Some 
solved 
using 
lock 
profiling 
Most 
of 
the 
remainder 
solved 
using 
CPU 
performance 
counters 
Some 
solved 
using 
Off-­‐CPU 
Flame 
Graphs 
Usually 
a 
solved 
problem 
Many 
not 
straigh=orward
7. 
Chain 
Graphs
Walking 
the 
Wakeups…
chain-­‐sshd.svg
DEMO
Chain 
Graphs: 
Commands 
• This 
may 
be 
too 
advanced 
for 
current 
DTrace 
– Can’t 
save 
stacks 
as 
variables 
– Overheads 
for 
tracing 
everything 
can 
become 
serious 
• My 
prototype 
involved 
workarounds 
– Aggrega=ng 
off-­‐CPU-­‐>on-­‐CPU 
=me 
by: 
• execname, 
pid, 
blocking 
CV 
address, 
and 
stacks 
– Aggrega=ng 
sleep-­‐>wakeup 
=me 
by 
the 
same 
– Perl 
post-­‐processing 
to 
connect 
the 
dots 
– Assuming 
that 
CV 
addrs 
aren’t 
reused 
during 
tracing
Grand 
Unified 
Analysis 
• There 
are 
two 
types 
of 
performance 
issues: 
– On-­‐CPU: 
Usually 
solved 
using 
CPU 
flame 
graphs 
– Off-­‐CPU: 
Can 
be 
solved 
using 
chain 
graphs 
• Combining 
CPU 
& 
chain 
graphs 
would 
provide 
a 
unified 
visualiza=on 
for 
all 
perf 
issues 
• Similar 
work 
to 
chain 
graphs: 
– Francis 
Giraldeau 
(École 
Polytechnique 
de 
Montréal), 
wakeup 
graph 
using 
LTTng, 
+ 
distributed 
systems: 
hup://www.tracingsummit.org/w/images/0/00/ 
TracingSummit2014-­‐Why-­‐App-­‐Wai=ng.pdf
Other 
Topics
Hot/Cold 
Flame 
Graphs 
• CPU 
samples 
& 
off-­‐CPU 
=me 
in 
one 
flame 
graph 
– Off-­‐CPU 
=me 
ofen 
dominates 
& 
compresses 
CPU 
=me 
too 
much. 
By-­‐thread 
flame 
graphs 
helps. 
• Example 
by 
Vladimir 
Kirillov, 
adding 
a 
blue 
frame:
CPU 
Counters 
• Use 
pmcstat 
to 
make 
flame 
graphs 
for 
cycles, 
instruc=ons, 
cache 
misses, 
stall 
cycles, 
CPI, 
…
cpi-­‐flamegraph-­‐01.svg
Differen=al 
Flame 
Graphs 
• Just 
added 
(used 
to 
make 
the 
CPI 
flame 
graph) 
• Useful 
for 
non-­‐regression 
tes=ng: 
hue 
shows 
difference 
between 
profile1 
and 
profile2
Flame 
Charts 
• Similar 
to 
flame 
graphs, 
but 
different 
x-­‐axis 
• x-­‐axis: 
passage 
of 
=me 
• Created 
by 
Google 
for 
Chrome/v8, 
inspired 
by 
flame 
graphs 
• Needs 
=mestamped 
stacks 
– Flame 
graphs 
just 
need 
stacks 
& 
counts, 
which 
is 
usually 
much 
less 
data
Other 
Implementa=ons/Uses 
• See 
hup://www.brendangregg.com/flamegraphs.html#Updates 
• Some 
use 
applica=on 
profile 
sources, 
which 
should 
work 
on 
FreeBSD 
• Thanks 
everyone 
who 
has 
contributed!
Other 
Text 
-­‐> 
Interac=ve 
SVG 
Tools 
• Heatmaps: 
hups://github.com/brendangregg/HeatMap
References 
& 
Links 
• Flame 
Graphs: 
• hup://www.brendangregg.com/flamegraphs.html 
• hup://www.brendangregg.com/FlameGraphs/cpuflamegraphs.html 
• hup://www.brendangregg.com/FlameGraphs/memoryflamegraphs.html 
• hup://www.brendangregg.com/FlameGraphs/offcpuflamegraphs.html 
• hup://www.brendangregg.com/FlameGraphs/hotcoldflamegraphs.html 
• hup://www.slideshare.net/brendangregg/blazing-­‐performance-­‐with-­‐flame-­‐graphs 
and 
hups://www.youtube.com/watch?v=nZfNehCzGdw 
• hups://github.com/brendangregg/FlameGraph 
• hup://agentzh.org/misc/slides/off-­‐cpu-­‐flame-­‐graphs.pdf 
• Ne5lix 
Open 
Connect 
Appliance 
(FreeBSD): 
• hups://openconnect.itp.ne5lix.com/ 
• Systems 
Performance, 
Pren=ce 
Hall: 
• hup://www.brendangregg.com/sysper€ook.html
Thanks 
• Ques=ons? 
• hup://slideshare.net/brendangregg 
• hup://www.brendangregg.com 
• bgregg@ne5lix.com 
• @brendangregg

More Related Content

What's hot

EuroBSDcon 2017 System Performance Analysis Methodologies
EuroBSDcon 2017 System Performance Analysis MethodologiesEuroBSDcon 2017 System Performance Analysis Methodologies
EuroBSDcon 2017 System Performance Analysis MethodologiesBrendan Gregg
 
Netflix: From Clouds to Roots
Netflix: From Clouds to RootsNetflix: From Clouds to Roots
Netflix: From Clouds to RootsBrendan Gregg
 
Performance Analysis: new tools and concepts from the cloud
Performance Analysis: new tools and concepts from the cloudPerformance Analysis: new tools and concepts from the cloud
Performance Analysis: new tools and concepts from the cloudBrendan Gregg
 
Kernel Recipes 2017 - Understanding the Linux kernel via ftrace - Steven Rostedt
Kernel Recipes 2017 - Understanding the Linux kernel via ftrace - Steven RostedtKernel Recipes 2017 - Understanding the Linux kernel via ftrace - Steven Rostedt
Kernel Recipes 2017 - Understanding the Linux kernel via ftrace - Steven RostedtAnne Nicolas
 
MeetBSD2014 Performance Analysis
MeetBSD2014 Performance AnalysisMeetBSD2014 Performance Analysis
MeetBSD2014 Performance AnalysisBrendan Gregg
 
Linux Performance Tools
Linux Performance ToolsLinux Performance Tools
Linux Performance ToolsBrendan Gregg
 
BPF: Tracing and more
BPF: Tracing and moreBPF: Tracing and more
BPF: Tracing and moreBrendan Gregg
 
Linux Performance Analysis: New Tools and Old Secrets
Linux Performance Analysis: New Tools and Old SecretsLinux Performance Analysis: New Tools and Old Secrets
Linux Performance Analysis: New Tools and Old SecretsBrendan Gregg
 
YOW2020 Linux Systems Performance
YOW2020 Linux Systems PerformanceYOW2020 Linux Systems Performance
YOW2020 Linux Systems PerformanceBrendan Gregg
 
Linux Performance Profiling and Monitoring
Linux Performance Profiling and MonitoringLinux Performance Profiling and Monitoring
Linux Performance Profiling and MonitoringGeorg Schönberger
 
Broken Linux Performance Tools 2016
Broken Linux Performance Tools 2016Broken Linux Performance Tools 2016
Broken Linux Performance Tools 2016Brendan Gregg
 
Linux Systems Performance 2016
Linux Systems Performance 2016Linux Systems Performance 2016
Linux Systems Performance 2016Brendan Gregg
 
Linux Profiling at Netflix
Linux Profiling at NetflixLinux Profiling at Netflix
Linux Profiling at NetflixBrendan Gregg
 
Performance Wins with BPF: Getting Started
Performance Wins with BPF: Getting StartedPerformance Wins with BPF: Getting Started
Performance Wins with BPF: Getting StartedBrendan Gregg
 
LISA2010 visualizations
LISA2010 visualizationsLISA2010 visualizations
LISA2010 visualizationsBrendan Gregg
 
Real-time in the real world: DIRT in production
Real-time in the real world: DIRT in productionReal-time in the real world: DIRT in production
Real-time in the real world: DIRT in productionbcantrill
 
Performance Wins with eBPF: Getting Started (2021)
Performance Wins with eBPF: Getting Started (2021)Performance Wins with eBPF: Getting Started (2021)
Performance Wins with eBPF: Getting Started (2021)Brendan Gregg
 
LSFMM 2019 BPF Observability
LSFMM 2019 BPF ObservabilityLSFMM 2019 BPF Observability
LSFMM 2019 BPF ObservabilityBrendan Gregg
 
Velocity 2015 linux perf tools
Velocity 2015 linux perf toolsVelocity 2015 linux perf tools
Velocity 2015 linux perf toolsBrendan Gregg
 
Linux Performance 2018 (PerconaLive keynote)
Linux Performance 2018 (PerconaLive keynote)Linux Performance 2018 (PerconaLive keynote)
Linux Performance 2018 (PerconaLive keynote)Brendan Gregg
 

What's hot (20)

EuroBSDcon 2017 System Performance Analysis Methodologies
EuroBSDcon 2017 System Performance Analysis MethodologiesEuroBSDcon 2017 System Performance Analysis Methodologies
EuroBSDcon 2017 System Performance Analysis Methodologies
 
Netflix: From Clouds to Roots
Netflix: From Clouds to RootsNetflix: From Clouds to Roots
Netflix: From Clouds to Roots
 
Performance Analysis: new tools and concepts from the cloud
Performance Analysis: new tools and concepts from the cloudPerformance Analysis: new tools and concepts from the cloud
Performance Analysis: new tools and concepts from the cloud
 
Kernel Recipes 2017 - Understanding the Linux kernel via ftrace - Steven Rostedt
Kernel Recipes 2017 - Understanding the Linux kernel via ftrace - Steven RostedtKernel Recipes 2017 - Understanding the Linux kernel via ftrace - Steven Rostedt
Kernel Recipes 2017 - Understanding the Linux kernel via ftrace - Steven Rostedt
 
MeetBSD2014 Performance Analysis
MeetBSD2014 Performance AnalysisMeetBSD2014 Performance Analysis
MeetBSD2014 Performance Analysis
 
Linux Performance Tools
Linux Performance ToolsLinux Performance Tools
Linux Performance Tools
 
BPF: Tracing and more
BPF: Tracing and moreBPF: Tracing and more
BPF: Tracing and more
 
Linux Performance Analysis: New Tools and Old Secrets
Linux Performance Analysis: New Tools and Old SecretsLinux Performance Analysis: New Tools and Old Secrets
Linux Performance Analysis: New Tools and Old Secrets
 
YOW2020 Linux Systems Performance
YOW2020 Linux Systems PerformanceYOW2020 Linux Systems Performance
YOW2020 Linux Systems Performance
 
Linux Performance Profiling and Monitoring
Linux Performance Profiling and MonitoringLinux Performance Profiling and Monitoring
Linux Performance Profiling and Monitoring
 
Broken Linux Performance Tools 2016
Broken Linux Performance Tools 2016Broken Linux Performance Tools 2016
Broken Linux Performance Tools 2016
 
Linux Systems Performance 2016
Linux Systems Performance 2016Linux Systems Performance 2016
Linux Systems Performance 2016
 
Linux Profiling at Netflix
Linux Profiling at NetflixLinux Profiling at Netflix
Linux Profiling at Netflix
 
Performance Wins with BPF: Getting Started
Performance Wins with BPF: Getting StartedPerformance Wins with BPF: Getting Started
Performance Wins with BPF: Getting Started
 
LISA2010 visualizations
LISA2010 visualizationsLISA2010 visualizations
LISA2010 visualizations
 
Real-time in the real world: DIRT in production
Real-time in the real world: DIRT in productionReal-time in the real world: DIRT in production
Real-time in the real world: DIRT in production
 
Performance Wins with eBPF: Getting Started (2021)
Performance Wins with eBPF: Getting Started (2021)Performance Wins with eBPF: Getting Started (2021)
Performance Wins with eBPF: Getting Started (2021)
 
LSFMM 2019 BPF Observability
LSFMM 2019 BPF ObservabilityLSFMM 2019 BPF Observability
LSFMM 2019 BPF Observability
 
Velocity 2015 linux perf tools
Velocity 2015 linux perf toolsVelocity 2015 linux perf tools
Velocity 2015 linux perf tools
 
Linux Performance 2018 (PerconaLive keynote)
Linux Performance 2018 (PerconaLive keynote)Linux Performance 2018 (PerconaLive keynote)
Linux Performance 2018 (PerconaLive keynote)
 

Viewers also liked

Performance Tuning EC2 Instances
Performance Tuning EC2 InstancesPerformance Tuning EC2 Instances
Performance Tuning EC2 InstancesBrendan Gregg
 
Monitorama 2015 Netflix Instance Analysis
Monitorama 2015 Netflix Instance AnalysisMonitorama 2015 Netflix Instance Analysis
Monitorama 2015 Netflix Instance AnalysisBrendan Gregg
 
Linux Performance Tools 2014
Linux Performance Tools 2014Linux Performance Tools 2014
Linux Performance Tools 2014Brendan Gregg
 
From DTrace to Linux
From DTrace to LinuxFrom DTrace to Linux
From DTrace to LinuxBrendan Gregg
 
JavaOne 2015 Java Mixed-Mode Flame Graphs
JavaOne 2015 Java Mixed-Mode Flame GraphsJavaOne 2015 Java Mixed-Mode Flame Graphs
JavaOne 2015 Java Mixed-Mode Flame GraphsBrendan Gregg
 
Linux 4.x Tracing: Performance Analysis with bcc/BPF
Linux 4.x Tracing: Performance Analysis with bcc/BPFLinux 4.x Tracing: Performance Analysis with bcc/BPF
Linux 4.x Tracing: Performance Analysis with bcc/BPFBrendan Gregg
 
Linux Performance Analysis and Tools
Linux Performance Analysis and ToolsLinux Performance Analysis and Tools
Linux Performance Analysis and ToolsBrendan Gregg
 
How to Diagnose Problems Quickly on Linux Servers
How to Diagnose Problems Quickly on Linux ServersHow to Diagnose Problems Quickly on Linux Servers
How to Diagnose Problems Quickly on Linux ServersRichard Cunningham
 
The New Systems Performance
The New Systems PerformanceThe New Systems Performance
The New Systems PerformanceBrendan Gregg
 
Introducing Exchange Server 2010
Introducing Exchange Server 2010Introducing Exchange Server 2010
Introducing Exchange Server 2010Harold Wong
 
Lisa12 methodologies
Lisa12 methodologiesLisa12 methodologies
Lisa12 methodologiesBrendan Gregg
 
Introduction to WAMP, a protocol enabling PUB/SUB and RPC over Websocket
Introduction to WAMP, a protocol enabling PUB/SUB and RPC over WebsocketIntroduction to WAMP, a protocol enabling PUB/SUB and RPC over Websocket
Introduction to WAMP, a protocol enabling PUB/SUB and RPC over Websocketsametmax
 
WebLogic Deployment Plan Example
WebLogic Deployment Plan ExampleWebLogic Deployment Plan Example
WebLogic Deployment Plan ExampleJames Bayer
 
Systems Performance: Enterprise and the Cloud
Systems Performance: Enterprise and the CloudSystems Performance: Enterprise and the Cloud
Systems Performance: Enterprise and the CloudBrendan Gregg
 
Stop the Guessing: Performance Methodologies for Production Systems
Stop the Guessing: Performance Methodologies for Production SystemsStop the Guessing: Performance Methodologies for Production Systems
Stop the Guessing: Performance Methodologies for Production SystemsBrendan Gregg
 

Viewers also liked (17)

Performance Tuning EC2 Instances
Performance Tuning EC2 InstancesPerformance Tuning EC2 Instances
Performance Tuning EC2 Instances
 
Monitorama 2015 Netflix Instance Analysis
Monitorama 2015 Netflix Instance AnalysisMonitorama 2015 Netflix Instance Analysis
Monitorama 2015 Netflix Instance Analysis
 
Linux Performance Tools 2014
Linux Performance Tools 2014Linux Performance Tools 2014
Linux Performance Tools 2014
 
From DTrace to Linux
From DTrace to LinuxFrom DTrace to Linux
From DTrace to Linux
 
JavaOne 2015 Java Mixed-Mode Flame Graphs
JavaOne 2015 Java Mixed-Mode Flame GraphsJavaOne 2015 Java Mixed-Mode Flame Graphs
JavaOne 2015 Java Mixed-Mode Flame Graphs
 
Linux 4.x Tracing: Performance Analysis with bcc/BPF
Linux 4.x Tracing: Performance Analysis with bcc/BPFLinux 4.x Tracing: Performance Analysis with bcc/BPF
Linux 4.x Tracing: Performance Analysis with bcc/BPF
 
Linux Performance Analysis and Tools
Linux Performance Analysis and ToolsLinux Performance Analysis and Tools
Linux Performance Analysis and Tools
 
How to Diagnose Problems Quickly on Linux Servers
How to Diagnose Problems Quickly on Linux ServersHow to Diagnose Problems Quickly on Linux Servers
How to Diagnose Problems Quickly on Linux Servers
 
Implandata
ImplandataImplandata
Implandata
 
DTraceCloud2012
DTraceCloud2012DTraceCloud2012
DTraceCloud2012
 
The New Systems Performance
The New Systems PerformanceThe New Systems Performance
The New Systems Performance
 
Introducing Exchange Server 2010
Introducing Exchange Server 2010Introducing Exchange Server 2010
Introducing Exchange Server 2010
 
Lisa12 methodologies
Lisa12 methodologiesLisa12 methodologies
Lisa12 methodologies
 
Introduction to WAMP, a protocol enabling PUB/SUB and RPC over Websocket
Introduction to WAMP, a protocol enabling PUB/SUB and RPC over WebsocketIntroduction to WAMP, a protocol enabling PUB/SUB and RPC over Websocket
Introduction to WAMP, a protocol enabling PUB/SUB and RPC over Websocket
 
WebLogic Deployment Plan Example
WebLogic Deployment Plan ExampleWebLogic Deployment Plan Example
WebLogic Deployment Plan Example
 
Systems Performance: Enterprise and the Cloud
Systems Performance: Enterprise and the CloudSystems Performance: Enterprise and the Cloud
Systems Performance: Enterprise and the Cloud
 
Stop the Guessing: Performance Methodologies for Production Systems
Stop the Guessing: Performance Methodologies for Production SystemsStop the Guessing: Performance Methodologies for Production Systems
Stop the Guessing: Performance Methodologies for Production Systems
 

Similar to FreeBSD 2014 Flame Graphs

Performance tweaks and tools for Linux (Joe Damato)
Performance tweaks and tools for Linux (Joe Damato)Performance tweaks and tools for Linux (Joe Damato)
Performance tweaks and tools for Linux (Joe Damato)Ontico
 
Debugging Ruby Systems
Debugging Ruby SystemsDebugging Ruby Systems
Debugging Ruby SystemsEngine Yard
 
Solr Troubleshooting - TreeMap approach
Solr Troubleshooting - TreeMap approachSolr Troubleshooting - TreeMap approach
Solr Troubleshooting - TreeMap approachAlexandre Rafalovitch
 
Solr Troubleshooting - Treemap Approach: Presented by Alexandre Rafolovitch, ...
Solr Troubleshooting - Treemap Approach: Presented by Alexandre Rafolovitch, ...Solr Troubleshooting - Treemap Approach: Presented by Alexandre Rafolovitch, ...
Solr Troubleshooting - Treemap Approach: Presented by Alexandre Rafolovitch, ...Lucidworks
 
D Trace Support In My Sql Guide To Solving Reallife Performance Problems
D Trace Support In My Sql Guide To Solving Reallife Performance ProblemsD Trace Support In My Sql Guide To Solving Reallife Performance Problems
D Trace Support In My Sql Guide To Solving Reallife Performance ProblemsMySQLConference
 
Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...
Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...
Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...Altinity Ltd
 
CONFidence 2015: DTrace + OSX = Fun - Andrzej Dyjak
CONFidence 2015: DTrace + OSX = Fun - Andrzej Dyjak   CONFidence 2015: DTrace + OSX = Fun - Andrzej Dyjak
CONFidence 2015: DTrace + OSX = Fun - Andrzej Dyjak PROIDEA
 
Advanced Replication
Advanced ReplicationAdvanced Replication
Advanced ReplicationMongoDB
 
rsyslog v8: more than just syslog!
rsyslog v8: more than just syslog!rsyslog v8: more than just syslog!
rsyslog v8: more than just syslog!Yury Bushmelev
 
Scaling web applications with cassandra presentation
Scaling web applications with cassandra presentationScaling web applications with cassandra presentation
Scaling web applications with cassandra presentationMurat Çakal
 
Csw2016 wheeler barksdale-gruskovnjak-execute_mypacket
Csw2016 wheeler barksdale-gruskovnjak-execute_mypacketCsw2016 wheeler barksdale-gruskovnjak-execute_mypacket
Csw2016 wheeler barksdale-gruskovnjak-execute_mypacketCanSecWest
 
Apache Spark Core—Deep Dive—Proper Optimization
Apache Spark Core—Deep Dive—Proper OptimizationApache Spark Core—Deep Dive—Proper Optimization
Apache Spark Core—Deep Dive—Proper OptimizationDatabricks
 
Bottom to Top Stack Optimization with LAMP
Bottom to Top Stack Optimization with LAMPBottom to Top Stack Optimization with LAMP
Bottom to Top Stack Optimization with LAMPkatzgrau
 
Bottom to Top Stack Optimization - CICON2011
Bottom to Top Stack Optimization - CICON2011Bottom to Top Stack Optimization - CICON2011
Bottom to Top Stack Optimization - CICON2011CodeIgniter Conference
 
Perl at SkyCon'12
Perl at SkyCon'12Perl at SkyCon'12
Perl at SkyCon'12Tim Bunce
 
[CCC-28c3] Post Memory Corruption Memory Analysis
[CCC-28c3] Post Memory Corruption Memory Analysis[CCC-28c3] Post Memory Corruption Memory Analysis
[CCC-28c3] Post Memory Corruption Memory AnalysisMoabi.com
 
Sergi Álvarez + Roi Martín - radare2: From forensics to bindiffing [RootedCON...
Sergi Álvarez + Roi Martín - radare2: From forensics to bindiffing [RootedCON...Sergi Álvarez + Roi Martín - radare2: From forensics to bindiffing [RootedCON...
Sergi Álvarez + Roi Martín - radare2: From forensics to bindiffing [RootedCON...RootedCON
 
Stack kicker devopsdays-london-2013
Stack kicker devopsdays-london-2013Stack kicker devopsdays-london-2013
Stack kicker devopsdays-london-2013Simon McCartney
 

Similar to FreeBSD 2014 Flame Graphs (20)

Performance tweaks and tools for Linux (Joe Damato)
Performance tweaks and tools for Linux (Joe Damato)Performance tweaks and tools for Linux (Joe Damato)
Performance tweaks and tools for Linux (Joe Damato)
 
Debugging Ruby Systems
Debugging Ruby SystemsDebugging Ruby Systems
Debugging Ruby Systems
 
Solr Troubleshooting - TreeMap approach
Solr Troubleshooting - TreeMap approachSolr Troubleshooting - TreeMap approach
Solr Troubleshooting - TreeMap approach
 
Solr Troubleshooting - Treemap Approach: Presented by Alexandre Rafolovitch, ...
Solr Troubleshooting - Treemap Approach: Presented by Alexandre Rafolovitch, ...Solr Troubleshooting - Treemap Approach: Presented by Alexandre Rafolovitch, ...
Solr Troubleshooting - Treemap Approach: Presented by Alexandre Rafolovitch, ...
 
D Trace Support In My Sql Guide To Solving Reallife Performance Problems
D Trace Support In My Sql Guide To Solving Reallife Performance ProblemsD Trace Support In My Sql Guide To Solving Reallife Performance Problems
D Trace Support In My Sql Guide To Solving Reallife Performance Problems
 
Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...
Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...
Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...
 
CONFidence 2015: DTrace + OSX = Fun - Andrzej Dyjak
CONFidence 2015: DTrace + OSX = Fun - Andrzej Dyjak   CONFidence 2015: DTrace + OSX = Fun - Andrzej Dyjak
CONFidence 2015: DTrace + OSX = Fun - Andrzej Dyjak
 
Os Wilhelm
Os WilhelmOs Wilhelm
Os Wilhelm
 
Advanced Replication
Advanced ReplicationAdvanced Replication
Advanced Replication
 
Deep Learning for Computer Vision: Software Frameworks (UPC 2016)
Deep Learning for Computer Vision: Software Frameworks (UPC 2016)Deep Learning for Computer Vision: Software Frameworks (UPC 2016)
Deep Learning for Computer Vision: Software Frameworks (UPC 2016)
 
rsyslog v8: more than just syslog!
rsyslog v8: more than just syslog!rsyslog v8: more than just syslog!
rsyslog v8: more than just syslog!
 
Scaling web applications with cassandra presentation
Scaling web applications with cassandra presentationScaling web applications with cassandra presentation
Scaling web applications with cassandra presentation
 
Csw2016 wheeler barksdale-gruskovnjak-execute_mypacket
Csw2016 wheeler barksdale-gruskovnjak-execute_mypacketCsw2016 wheeler barksdale-gruskovnjak-execute_mypacket
Csw2016 wheeler barksdale-gruskovnjak-execute_mypacket
 
Apache Spark Core—Deep Dive—Proper Optimization
Apache Spark Core—Deep Dive—Proper OptimizationApache Spark Core—Deep Dive—Proper Optimization
Apache Spark Core—Deep Dive—Proper Optimization
 
Bottom to Top Stack Optimization with LAMP
Bottom to Top Stack Optimization with LAMPBottom to Top Stack Optimization with LAMP
Bottom to Top Stack Optimization with LAMP
 
Bottom to Top Stack Optimization - CICON2011
Bottom to Top Stack Optimization - CICON2011Bottom to Top Stack Optimization - CICON2011
Bottom to Top Stack Optimization - CICON2011
 
Perl at SkyCon'12
Perl at SkyCon'12Perl at SkyCon'12
Perl at SkyCon'12
 
[CCC-28c3] Post Memory Corruption Memory Analysis
[CCC-28c3] Post Memory Corruption Memory Analysis[CCC-28c3] Post Memory Corruption Memory Analysis
[CCC-28c3] Post Memory Corruption Memory Analysis
 
Sergi Álvarez + Roi Martín - radare2: From forensics to bindiffing [RootedCON...
Sergi Álvarez + Roi Martín - radare2: From forensics to bindiffing [RootedCON...Sergi Álvarez + Roi Martín - radare2: From forensics to bindiffing [RootedCON...
Sergi Álvarez + Roi Martín - radare2: From forensics to bindiffing [RootedCON...
 
Stack kicker devopsdays-london-2013
Stack kicker devopsdays-london-2013Stack kicker devopsdays-london-2013
Stack kicker devopsdays-london-2013
 

More from Brendan Gregg

YOW2021 Computing Performance
YOW2021 Computing PerformanceYOW2021 Computing Performance
YOW2021 Computing PerformanceBrendan Gregg
 
IntelON 2021 Processor Benchmarking
IntelON 2021 Processor BenchmarkingIntelON 2021 Processor Benchmarking
IntelON 2021 Processor BenchmarkingBrendan Gregg
 
Systems@Scale 2021 BPF Performance Getting Started
Systems@Scale 2021 BPF Performance Getting StartedSystems@Scale 2021 BPF Performance Getting Started
Systems@Scale 2021 BPF Performance Getting StartedBrendan Gregg
 
Computing Performance: On the Horizon (2021)
Computing Performance: On the Horizon (2021)Computing Performance: On the Horizon (2021)
Computing Performance: On the Horizon (2021)Brendan Gregg
 
BPF Internals (eBPF)
BPF Internals (eBPF)BPF Internals (eBPF)
BPF Internals (eBPF)Brendan Gregg
 
re:Invent 2019 BPF Performance Analysis at Netflix
re:Invent 2019 BPF Performance Analysis at Netflixre:Invent 2019 BPF Performance Analysis at Netflix
re:Invent 2019 BPF Performance Analysis at NetflixBrendan Gregg
 
UM2019 Extended BPF: A New Type of Software
UM2019 Extended BPF: A New Type of SoftwareUM2019 Extended BPF: A New Type of Software
UM2019 Extended BPF: A New Type of SoftwareBrendan Gregg
 
LISA2019 Linux Systems Performance
LISA2019 Linux Systems PerformanceLISA2019 Linux Systems Performance
LISA2019 Linux Systems PerformanceBrendan Gregg
 
LPC2019 BPF Tracing Tools
LPC2019 BPF Tracing ToolsLPC2019 BPF Tracing Tools
LPC2019 BPF Tracing ToolsBrendan Gregg
 
YOW2018 CTO Summit: Working at netflix
YOW2018 CTO Summit: Working at netflixYOW2018 CTO Summit: Working at netflix
YOW2018 CTO Summit: Working at netflixBrendan Gregg
 
eBPF Perf Tools 2019
eBPF Perf Tools 2019eBPF Perf Tools 2019
eBPF Perf Tools 2019Brendan Gregg
 
YOW2018 Cloud Performance Root Cause Analysis at Netflix
YOW2018 Cloud Performance Root Cause Analysis at NetflixYOW2018 Cloud Performance Root Cause Analysis at Netflix
YOW2018 Cloud Performance Root Cause Analysis at NetflixBrendan Gregg
 
NetConf 2018 BPF Observability
NetConf 2018 BPF ObservabilityNetConf 2018 BPF Observability
NetConf 2018 BPF ObservabilityBrendan Gregg
 
ATO Linux Performance 2018
ATO Linux Performance 2018ATO Linux Performance 2018
ATO Linux Performance 2018Brendan Gregg
 
How Netflix Tunes EC2 Instances for Performance
How Netflix Tunes EC2 Instances for PerformanceHow Netflix Tunes EC2 Instances for Performance
How Netflix Tunes EC2 Instances for PerformanceBrendan Gregg
 
LISA17 Container Performance Analysis
LISA17 Container Performance AnalysisLISA17 Container Performance Analysis
LISA17 Container Performance AnalysisBrendan Gregg
 
Kernel Recipes 2017: Using Linux perf at Netflix
Kernel Recipes 2017: Using Linux perf at NetflixKernel Recipes 2017: Using Linux perf at Netflix
Kernel Recipes 2017: Using Linux perf at NetflixBrendan Gregg
 
Kernel Recipes 2017: Performance Analysis with BPF
Kernel Recipes 2017: Performance Analysis with BPFKernel Recipes 2017: Performance Analysis with BPF
Kernel Recipes 2017: Performance Analysis with BPFBrendan Gregg
 

More from Brendan Gregg (20)

YOW2021 Computing Performance
YOW2021 Computing PerformanceYOW2021 Computing Performance
YOW2021 Computing Performance
 
IntelON 2021 Processor Benchmarking
IntelON 2021 Processor BenchmarkingIntelON 2021 Processor Benchmarking
IntelON 2021 Processor Benchmarking
 
Systems@Scale 2021 BPF Performance Getting Started
Systems@Scale 2021 BPF Performance Getting StartedSystems@Scale 2021 BPF Performance Getting Started
Systems@Scale 2021 BPF Performance Getting Started
 
Computing Performance: On the Horizon (2021)
Computing Performance: On the Horizon (2021)Computing Performance: On the Horizon (2021)
Computing Performance: On the Horizon (2021)
 
BPF Internals (eBPF)
BPF Internals (eBPF)BPF Internals (eBPF)
BPF Internals (eBPF)
 
re:Invent 2019 BPF Performance Analysis at Netflix
re:Invent 2019 BPF Performance Analysis at Netflixre:Invent 2019 BPF Performance Analysis at Netflix
re:Invent 2019 BPF Performance Analysis at Netflix
 
UM2019 Extended BPF: A New Type of Software
UM2019 Extended BPF: A New Type of SoftwareUM2019 Extended BPF: A New Type of Software
UM2019 Extended BPF: A New Type of Software
 
LISA2019 Linux Systems Performance
LISA2019 Linux Systems PerformanceLISA2019 Linux Systems Performance
LISA2019 Linux Systems Performance
 
LPC2019 BPF Tracing Tools
LPC2019 BPF Tracing ToolsLPC2019 BPF Tracing Tools
LPC2019 BPF Tracing Tools
 
YOW2018 CTO Summit: Working at netflix
YOW2018 CTO Summit: Working at netflixYOW2018 CTO Summit: Working at netflix
YOW2018 CTO Summit: Working at netflix
 
eBPF Perf Tools 2019
eBPF Perf Tools 2019eBPF Perf Tools 2019
eBPF Perf Tools 2019
 
YOW2018 Cloud Performance Root Cause Analysis at Netflix
YOW2018 Cloud Performance Root Cause Analysis at NetflixYOW2018 Cloud Performance Root Cause Analysis at Netflix
YOW2018 Cloud Performance Root Cause Analysis at Netflix
 
BPF Tools 2017
BPF Tools 2017BPF Tools 2017
BPF Tools 2017
 
NetConf 2018 BPF Observability
NetConf 2018 BPF ObservabilityNetConf 2018 BPF Observability
NetConf 2018 BPF Observability
 
FlameScope 2018
FlameScope 2018FlameScope 2018
FlameScope 2018
 
ATO Linux Performance 2018
ATO Linux Performance 2018ATO Linux Performance 2018
ATO Linux Performance 2018
 
How Netflix Tunes EC2 Instances for Performance
How Netflix Tunes EC2 Instances for PerformanceHow Netflix Tunes EC2 Instances for Performance
How Netflix Tunes EC2 Instances for Performance
 
LISA17 Container Performance Analysis
LISA17 Container Performance AnalysisLISA17 Container Performance Analysis
LISA17 Container Performance Analysis
 
Kernel Recipes 2017: Using Linux perf at Netflix
Kernel Recipes 2017: Using Linux perf at NetflixKernel Recipes 2017: Using Linux perf at Netflix
Kernel Recipes 2017: Using Linux perf at Netflix
 
Kernel Recipes 2017: Performance Analysis with BPF
Kernel Recipes 2017: Performance Analysis with BPFKernel Recipes 2017: Performance Analysis with BPF
Kernel Recipes 2017: Performance Analysis with BPF
 

Recently uploaded

Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 

Recently uploaded (20)

Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 

FreeBSD 2014 Flame Graphs

  • 1. FreeBSD Developer and Vendor Summit, Nov, 2014 Flame Graphs on FreeBSD Brendan Gregg Senior Performance Architect Performance Engineering Team bgregg@ne5lix.com @brendangregg
  • 2.
  • 3. Agenda 1. Genesis 2. Genera=on 3. CPU 4. Memory 5. Disk I/O 6. Off-­‐CPU 7. Chain
  • 5. The Problem • The same MySQL load on one host runs at 30% higher CPU than another. Why? • CPU profiling should answer this easily
  • 6. # dtrace -x ustackframes=100 -n 'profile-997 /execname == "mysqld"/ {! @[ustack()] = count(); } tick-60s { exit(0); }'! dtrace: description 'profile-997 ' matched 2 probes! CPU ID FUNCTION:NAME! 1 75195 :tick-60s! [...]! libc.so.1`__priocntlset+0xa! libc.so.1`getparam+0x83! libc.so.1`pthread_getschedparam+0x3c! libc.so.1`pthread_setschedprio+0x1f! mysqld`_Z16dispatch_command19enum_server_commandP3THDPcj+0x9ab! mysqld`_Z10do_commandP3THD+0x198! mysqld`handle_one_connection+0x1a6! libc.so.1`_thrp_setup+0x8d! libc.so.1`_lwp_start! 4884! ! mysqld`_Z13add_to_statusP17system_status_varS0_+0x47! mysqld`_Z22calc_sum_of_all_statusP17system_status_var+0x67! mysqld`_Z16dispatch_command19enum_server_commandP3THDPcj+0x1222! mysqld`_Z10do_commandP3THD+0x198! mysqld`handle_one_connection+0x1a6! libc.so.1`_thrp_setup+0x8d! libc.so.1`_lwp_start! 5530!
  • 7. # dtrace -x ustackframes=100 -n 'profile-997 /execname == "mysqld"/ {! @[ustack()] = count(); } tick-60s { exit(0); }'! dtrace: description 'profile-997 ' matched 2 probes! CPU ID FUNCTION:NAME! 1 75195 :tick-60s! [...]! libc.so.1`__priocntlset+0xa! libc.so.1`getparam+0x83! libc.so.1`pthread_getschedparam+0x3c! libc.so.1`pthread_setschedprio+0x1f! mysqld`_Z16dispatch_command19enum_server_commandP3THDPcj+0x9ab! mysqld`_Z10do_commandP3THD+0x198! mysqld`handle_one_connection+0x1a6! libc.so.1`_thrp_setup+0x8d! libc.so.1`_lwp_start! 4884! ! this stack was sampled this many =mes mysqld`_Z13add_to_statusP17system_status_varS0_+0x47! mysqld`_Z22calc_sum_of_all_statusP17system_status_var+0x67! mysqld`_Z16dispatch_command19enum_server_commandP3THDPcj+0x1222! mysqld`_Z10do_commandP3THD+0x198! mysqld`handle_one_connection+0x1a6! libc.so.1`_thrp_setup+0x8d! libc.so.1`_lwp_start! 5530! Only unique stacks are shown, with their counts. This compresses the output.
  • 8. # dtrace -x ustackframes=100 -n 'profile-997 /execname == "mysqld"/ {! @[ustack()] = count(); } tick-60s { exit(0); }'! dtrace: description 'profile-997 ' matched 2 probes! CPU ID FUNCTION:NAME! 1 75195 :tick-60s! [...]! libc.so.1`__priocntlset+0xa! libc.so.1`getparam+0x83! libc.so.1`pthread_getschedparam+0x3c! libc.so.1`pthread_setschedprio+0x1f! mysqld`_Z16dispatch_command19enum_server_commandP3THDPcj+0x9ab! mysqld`_Z10do_commandP3THD+0x198! mysqld`handle_one_connection+0x1a6! libc.so.1`_thrp_setup+0x8d! libc.so.1`_lwp_start! 4884! ! This stack – the most frequent – is <2% of the samples mysqld`_Z13add_to_statusP17system_status_varS0_+0x47! mysqld`_Z22calc_sum_of_all_statusP17system_status_var+0x67! mysqld`_Z16dispatch_command19enum_server_commandP3THDPcj+0x1222! mysqld`_Z10do_commandP3THD+0x198! mysqld`handle_one_connection+0x1a6! libc.so.1`_thrp_setup+0x8d! libc.so.1`_lwp_start! 5530!
  • 9. # dtrace -x ustackframes=100 -n 'profile-997 /execname == "mysqld"/ {! @[ustack()] = count(); } tick-60s { exit(0); }'! dtrace: description 'profile-997 ' matched 2 probes! CPU ID FUNCTION:NAME! 1 75195 :tick-60s! [...]! libc.so.1`__priocntlset+0xa! libc.so.1`getparam+0x83! libc.so.1`pthread_getschedparam+0x3c! libc.so.1`pthread_setschedprio+0x1f! mysqld`_Z16dispatch_command19enum_server_commandP3THDPcj+0x9ab! mysqld`_Z10do_commandP3THD+0x198! mysqld`handle_one_connection+0x1a6! libc.so.1`_thrp_setup+0x8d! libc.so.1`_lwp_start! 4884! ! mysqld`_Z13add_to_statusP17system_status_varS0_+0x47! mysqld`_Z22calc_sum_of_all_statusP17system_status_var+0x67! mysqld`_Z16dispatch_command19enum_server_commandP3THDPcj+0x1222! mysqld`_Z10do_commandP3THD+0x198! mysqld`handle_one_connection+0x1a6! libc.so.1`_thrp_setup+0x8d! libc.so.1`_lwp_start! 5530! Despite the terse output, I elided over 500,000 lines Here is what all the output looks like…
  • 10.
  • 11. Size of one stack Last two stacks
  • 12. These are just the unique stacks. I have to compare this grey featureless square, with a grey square from the other host, and explain the 30% CPU difference. And I need to do this by Friday.
  • 13. Flame Graph of the same data
  • 14. Flame Graph of the same data one stack sample stack depth number of samples
  • 15. Problem Solved • Comparing two flame graphs was trivial – MySQL codepath difference suggested different compiler op=miza=ons, which was confirmed • Flame graph needed in this case – Profile data was too large to consume otherwise – Not always the case: the profiler output might be small enough to read directly. For CPU profiles, it ofen isn’t.
  • 16. Flame Graphs: Defini=on • Boxes: are func=ons – Visualizes a frame of a stack trace • y-­‐axis: stack depth – The top func=on led to the profiling event, everything beneath it is ancestry: explains why • x-­‐axis: spans samples, sorted alphabe=cally – Box width shows sample count: bigger for more – Alphabe=cal sort improves merging of like-­‐frames • Colors: either random or a dimension – Random helps separate columns
  • 17. Flame Graphs: Presenta=on • All threads can be shown in the same graph – So can mul=ple distributed systems • Can be interac=ve – Mouse over for details – Click to zoom • Can be invented – Eg, Facebook’s icicle-­‐like flame graphs • Uses for color: – Differen=als – Modes: user/library/kernel
  • 19. Examples • Using DTrace to profile kernel CPU usage: # git clone https://github.com/brendangregg/FlameGraph! # cd FlameGraph! # kldload dtraceall # if needed! # dtrace -x stackframes=100 -n 'profile-197 /arg0/ {! @[stack()] = count(); } tick-60s { exit(0); }' -o out.stacks! # ./stackcollapse.pl out.stacks | ./flamegraph.pl > out.svg! • Using pmcstat to profile stall cycles: … ! # pmcstat –S RESOURCE_STALLS.ANY -O out.pmcstat sleep 10! # pmcstat -R out.pmcstat -z100 -G out.stacks! # ./stackcollapse-pmc.pl out.stacks | ./flamegraph.pl > out.svg!
  • 20. Steps 1. Profile Stacks 2. Fold Stacks 3. Flame Graph
  • 21. Step 1. Profile Stacks • FreeBSD data sources: – DTrace stack() or ustack() – pmcstat -­‐G stacks – Applica=on profilers – Anything that can gather full stacks
  • 22. Step 2. Fold Stacks • Profiled stacks are “folded” to 1 line per stack: func1;func2;func3;… count! …! • Many converters exist (usually Perl). Eg: Format Program DTrace stackcollapse.pl FreeBSD pmcstat stackcollapse-­‐pmc.pl Linux perf_events stackcollapse-­‐perf.pl OS X Instruments stackcollapse-­‐instruments.pl Lightweight Java Profiler stackcollapse-­‐ljp.awk
  • 23. Step 3. Flame Graph • flamegraph.pl converts folded stacks into an interac=ve SVG Flame Graph – Uses JavaScript. Open in a browser. USAGE: ./flamegraph.pl [options] infile > outfile.svg! !--title # change title text! !--width # width of image (default 1200)! !--height # height of each frame (default 16)! !--minwidth # omit smaller functions (default 0.1 pixels)! !--fonttype # font type (default "Verdana")! !--fontsize # font size (default 12)! !--countname # count type label (default "samples")! !--nametype # name type label (default "Function:")! !--colors # "hot", "mem", "io" palette (default "hot")! !--hash # colors are keyed by function name hash! !--cp # use consistent palette (palette.map)! !--reverse # generate stack-reversed flame graph! !--inverted # icicle graph! !--negate # switch differential hues (blue<->red)!
  • 24. Extra Step: Filtering • Folded stacks (single line records) are easy to process with grep/sed/awk • For CPU profiles, I commonly exclude cpu_idle(): # ./stackcollapse.pl out.stacks | grep –v cpu_idle | ! ./flamegraph.pl out.folded > out.svg! • Or click-­‐to-­‐zoom
  • 30. DEMO
  • 31. CPU: Commands • DTrace kernel stack sampling at 199 Hertz, 60 s: # dtrace -x stackframes=100 -n 'profile-199 /arg0/ {! @[stack()] = count(); } tick-60s { exit(0); }' -o out.stacks! • DTrace user stack sampling at 99 Hertz, 60 s: # dtrace -x ustackframes=100 -n 'profile-99 /arg1/ {! @[ustack()] = count(); } tick-60s { exit(0); }' -o out.stacks! • Warnings: – ustack() more expensive – Short-­‐lived processes will miss symbol transla=on
  • 32. CPU: Commands • DTrace both stack sampling at 99 Hertz, 30 s: # dtrace -x stackframes=100 -x ustackframes=100 -n '! profile-99 { @[stack(), ustack(), execname] = count(); }! tick-30s { printa("%k-%k%sn%@dn", @); exit(0); }! ' -o out.stacks! – This prints kernel then user stacks, separated by a “-­‐” line. flamegraph.pl makes “-­‐” lines grey (separators) • pmcstat for everything beyond sampling
  • 36. DEMO
  • 37. Memory: Commands • DTrace page fault profiling of kernel stacks, 30 s: # dtrace –x stackframes=100 -n 'fbt::vm_fault:entry {! @[stack()] = count(); } tick-30s { exit(0) }' -o out.stacks! • Flame graphs generated with -­‐-­‐colors=mem # cat out.stacks | ./stackcollapse.pl | ./flamegraph.pl ! --title="FreeBSD vm_fault() kernel stacks" --colors=mem ! --countname=pages --width=800 > vm_faults-kernel01.svg! • See earlier diagram for other targets
  • 38. Memory: Commands • DTrace page fault profiling of user stacks, 5 s: # dtrace -x ustackframes=100 -n 'fbt::vm_fault:entry {! @[ustack(), execname] = count(); } tick-5s { exit(0) }! ' -o out.stacks! • Warnings: – Overhead for user-­‐level stack transla=on rela=ve to number of unique stacks, and might be significant – Stacks for short-­‐lived processes may be hex, as transla=on is performed afer the process has exited • See also other memory target types (earlier pic)
  • 41. DEMO
  • 42. Disk I/O: Commands • Device I/O issued with kernel stacks, 30 s: # dtrace –x stackframes=100 -n 'io:::start {! @[stack()] = count(); } tick-10s { exit(0) }' -o out.stacks! • Flame graphs generated with -­‐-­‐colors=io # cat out.stacks | ./stackcollapse.pl | ./flamegraph.pl ! --title="FreeBSD Storage I/O Kernel Flame Graph" --colors=io! --countname=io--width=800 > iostart01.svg! • Note that this shows IOPS; would be beuer to measure and show latency
  • 44. Off-­‐CPU Profiling On-­‐CPU Profiling Thread State Transi=on Diagram Off-­‐CPU Profiling (everything else)
  • 47. DEMO
  • 48. Off-­‐CPU: Commands • DTrace off-­‐CPU =me kernel stacks, 10 s: # dtrace -x dynvarsize=8m -x stackframes=100 –n ‘! sched:::off-cpu { self->ts = timestamp; }! sched:::on-cpu /self->ts/ { @[stack()] =! sum(timestamp - self->ts); self->ts = 0; }! tick-10s { normalize(@, 1000000); exit(0); }' -o out.stacks! • Flame graph: countname=ms • Warning: Ofen high overhead. DTrace will drop: dtrace: 886 dynamic variable drops with non-empty dirty list! • User-­‐level stacks more interes=ng, and expensive
  • 49. Off-­‐CPU: Commands • DTrace off-­‐CPU =me kernel & user stacks, 10 s: # dtrace -x dynvarsize=8m -x stackframes=100 -x ustackframes=100 –n '! • Beware overheads • Real reason for blocking ofen obscured – Need to trace the wakeups, and examine their stacks sched:::off-cpu { self->ts = timestamp; } sched:::on-cpu /self->ts/ {! @[stack(), ustack(), execname] = sum(timestamp - self->ts);! self->ts = 0; }! tick-10s { normalize(@, 1000000);! printa("%k-%k%sn%@dn", @); exit(0); }! ' -o out.offcpu!
  • 50. Solve ALL The Issues • Tantalizingly close to solving all perf issues: On-­‐CPU Issues Off-­‐CPU Issues Common Common Some solved using top alone Some solved using iostat/systat Many solved using CPU (Sample) Flame Graphs Some solved using lock profiling Most of the remainder solved using CPU performance counters Some solved using Off-­‐CPU Flame Graphs Usually a solved problem Many not straigh=orward
  • 54. DEMO
  • 55. Chain Graphs: Commands • This may be too advanced for current DTrace – Can’t save stacks as variables – Overheads for tracing everything can become serious • My prototype involved workarounds – Aggrega=ng off-­‐CPU-­‐>on-­‐CPU =me by: • execname, pid, blocking CV address, and stacks – Aggrega=ng sleep-­‐>wakeup =me by the same – Perl post-­‐processing to connect the dots – Assuming that CV addrs aren’t reused during tracing
  • 56. Grand Unified Analysis • There are two types of performance issues: – On-­‐CPU: Usually solved using CPU flame graphs – Off-­‐CPU: Can be solved using chain graphs • Combining CPU & chain graphs would provide a unified visualiza=on for all perf issues • Similar work to chain graphs: – Francis Giraldeau (École Polytechnique de Montréal), wakeup graph using LTTng, + distributed systems: hup://www.tracingsummit.org/w/images/0/00/ TracingSummit2014-­‐Why-­‐App-­‐Wai=ng.pdf
  • 58. Hot/Cold Flame Graphs • CPU samples & off-­‐CPU =me in one flame graph – Off-­‐CPU =me ofen dominates & compresses CPU =me too much. By-­‐thread flame graphs helps. • Example by Vladimir Kirillov, adding a blue frame:
  • 59. CPU Counters • Use pmcstat to make flame graphs for cycles, instruc=ons, cache misses, stall cycles, CPI, …
  • 61. Differen=al Flame Graphs • Just added (used to make the CPI flame graph) • Useful for non-­‐regression tes=ng: hue shows difference between profile1 and profile2
  • 62. Flame Charts • Similar to flame graphs, but different x-­‐axis • x-­‐axis: passage of =me • Created by Google for Chrome/v8, inspired by flame graphs • Needs =mestamped stacks – Flame graphs just need stacks & counts, which is usually much less data
  • 63. Other Implementa=ons/Uses • See hup://www.brendangregg.com/flamegraphs.html#Updates • Some use applica=on profile sources, which should work on FreeBSD • Thanks everyone who has contributed!
  • 64. Other Text -­‐> Interac=ve SVG Tools • Heatmaps: hups://github.com/brendangregg/HeatMap
  • 65. References & Links • Flame Graphs: • hup://www.brendangregg.com/flamegraphs.html • hup://www.brendangregg.com/FlameGraphs/cpuflamegraphs.html • hup://www.brendangregg.com/FlameGraphs/memoryflamegraphs.html • hup://www.brendangregg.com/FlameGraphs/offcpuflamegraphs.html • hup://www.brendangregg.com/FlameGraphs/hotcoldflamegraphs.html • hup://www.slideshare.net/brendangregg/blazing-­‐performance-­‐with-­‐flame-­‐graphs and hups://www.youtube.com/watch?v=nZfNehCzGdw • hups://github.com/brendangregg/FlameGraph • hup://agentzh.org/misc/slides/off-­‐cpu-­‐flame-­‐graphs.pdf • Ne5lix Open Connect Appliance (FreeBSD): • hups://openconnect.itp.ne5lix.com/ • Systems Performance, Pren=ce Hall: • hup://www.brendangregg.com/sysper€ook.html
  • 66. Thanks • Ques=ons? • hup://slideshare.net/brendangregg • hup://www.brendangregg.com • bgregg@ne5lix.com • @brendangregg