Intro to
Linux Performance
Analysis
Chris McEniry
LOPSA-SD
March 27, 2014
Me
• Systems Architect
• Sony Network Entertainment
• 18 years running stuff
• Majority of the last 14 years: medium-large Internet
services
Read this book…
And look here:
http://www.brendangregg.com/
http://www.brendangregg.com/
methodology.html
http://www.brendangregg.com/Slides/
LISA2012_methodologies.pdf
http://www.amazon.com/Systems-Performance-Enterprise-Brendan-Gregg/dp/0133390098
The website is down!!!
It’s just too slow!
The DB is too slow!
The disk is too slow!
SLOW!!!
http://farm4.staticflickr.com/3190/2976755407_6a6a574596_o.jpg
SLOW!!!!
• What does slow mean
anyways?
• Is it not transferring fast
enough?
• Is it handling (not) too many
requests?
http://commons.wikimedia.org/wiki/File:United_States_sign_-_Slow_Traffic_Ahead.svg
Slow can mean…
• Latency: How long it takes
• ms, s, request time, etc
• Throughput: How much can
happen at the same time
• bandwidth, IOPS, rps, tps,
etc
http://upload.wikimedia.org/wikipedia/commons/2/2e/Miniature_DNF_Dictionary_055_ubt.JPG
Slowness comes from…
• Full utilization of a resource
• Waiting in a saturated queue
• Generated errors!
!
• The USE Method
http://farm6.staticflickr.com/5181/5614813544_a30d693a50_o.jpg
Utilization
• You have fully used up what’s
been allocated
• aka 5 lb bag
http://farm3.staticflickr.com/2524/4000641774_3331fe06fb_o.jpg
Saturation
• Waiting for someone else to
get done so you can do yours
• Typically because a resource
is fully utilized, but not
necessarily directly
http://www.fotocommunity.com/pc/pc/display/30396619
Errors
• Dropped packets
• Incorrect responses
• Deadlocks
• Timeouts
!
• Not all failures fail fast
http://farm8.staticflickr.com/7001/6509400855_aaaf915871_b.jpg
How do we determine?
• Different types of tools for
different examinations
• Depends on what you’re
looking for (which can be a
problem in and of itself)
http://farm5.staticflickr.com/4083/5086955738_61f6455ace_b.jpg
Resource vs Transaction
• Do you care if…
• a CPU is maxed out?
• processes are blocked?
• packets are lost?
• or if…
• a user’s request fails?
• a user gives up on waiting for a response?
Maturity
• Tracing tools, especially using
in production, requires a level
of maturity
• I’m not that mature… ;)
• No, really just focusing on the
basics first
http://upload.wikimedia.org/wikipedia/commons/b/bd/OFLC_large_R18%2B.svg
http://image.slidesharecdn.com/scalelinuxperformance-130224171331-phpapp01/95/slide-15-638.jpg?cb=1362166290
http://image.slidesharecdn.com/scalelinuxperformance-130224171331-phpapp01/95/slide-16-638.jpg?cb=1362166290
General
?
/var/log/messages
Errors
!
(mostly - sometimes stats go here)
/var/log/messages
CPU
?
uptime
Saturation of the scheduler
uptime
?
top
top
Saturation
Utilization
Memory
?
free
Utilization
free
?
vmstat
vmstat
Saturation
Utilization
Counts
?
slabtop
Utilization
slabtop
Disk
?
df
Utilization
df
?
iostat -x
Maybe you can get additional utilization if you know the
max r/s or w/s - but not as clear based on different
properties.
iostat -x
IO (Network)
?
ping
Errors
ping
?
netstat
Saturation
netstat
?
netstat -s
Errors
netstat -s
?
ifconfig
ifconfig
Saturation
Utilization
Errors
What are your examples?
http://upload.wikimedia.org/wikipedia/commons/f/f3/Uncle_Sam_(pointing_finger).jpg
Applications
Running out of Apache
Threads
• Lots of incoming requests
• Apache hits ServerLimit of
threads (Utilization!)
• Requests start to get stuck in
TCP backlog (Saturation!)
• Apache endpoints are
removed from load balancers
(Error!)
• Fail!
http://upload.wikimedia.org/wikipedia/commons/9/96/Colorful_Threads_(3965274345).jpg
Cold DB Start
• DB’s like to be in memory, but
can’t start that way
• All data requests go to disk
(which is SAN backed)
• SAN controller CPU gets
maxed out (Utilization!)
• HBA queues get deep
(Saturation!)
• Requests timeout (Error!)
• Fail!
Summary
Methods > Tools
• Don’t let tools get in the way of
solutions
• It’s easy to think that all your
missing a tool.
• But are you actually following
a method to your performance
madness?
http://upload.wikimedia.org/wikipedia/commons/6/6d/Three_Card_Monte.jpg
Anti-Methods
• Blame Someone Else
• Streetlight
• Drunk Man
• Random Change
• Passive Benchmark
!
• Don’t do these…
http://www.brendangregg.com/methodology.html http://upload.wikimedia.org/wikipedia/commons/a/af/Villainc.svg
Methods
• Ad Hoc Checklist
• Problem Statement
• Scientific
• Workload Characterization
• Drill-down Analysis
• By-layer
• Latency Analysis
• Tools
• Stack Profile
• Off-CPU Analysis
• Thread State Analysis
• Active Benchmark
http://www.brendangregg.com/methodology.html http://memegenerator.net/instance/9192015
Linux Performance
Tools
Chris McEniry
LOPSA-SD
March 27, 2014

Intro to linux performance analysis