Performance Instrumentation Beyond What You Do Now - Presentation Transcript
Performance
Instrumentation
beyond what
you do now
Cary Millsap
cary.millsap@method-r.com
Percona Performance Conference
Santa Clara, California
9:00a–9:55a Thursday 23 April 2009
1
1986
1989
Software
Developer
1999
and
Performance
Analyst
2008
4
5
Method R Corporation
http://method-r.com
6
What we do at Method R Corporation…
• Write code for you
• Troubleshoot performance problems
• Teach you how to do what we do
• Write software tools that make your work easier
7
Thinking clearly
about
performance
8
Performance is HARD
9
“Our users say that
everything is slow, but I
don’t know where to begin.”
10
“Our users are complaining,
but all our dials are green.”
11
But how can you know what
causes a specific task to be
slow?
19
20
21
It's
latches
21
It's
I/O
It's
latches
21
It's
I/O
It's It's
latches always I/
O
21
It's
It's
bad SQL
I/O
It's It's
latches always I/
O
21
It's
It's It's
bad SQL
always
I/O
bad SQL
It's It's
latches always I/
O
21
It's
It's It's
bad SQL
always
I/O
bad SQL
It's It's
latchesThere's always I/
not O
enough
memory
21
It's
It's It's
bad SQL
always
I/O
bad SQL
It's It's
latchesThere's always I/ There's
not O never
enough enough
memory memory
21
My problem…
22
How can you possibly
know that?
23
Reminded me of…
24
25
vailroger.googlepages.com/orionconstellation
You do see it...
Right?
26
27
vailroger.googlepages.com/orionconstellation
27
vailroger.googlepages.com/orionconstellation
But who says
that
is what you have to see?
28
29
29
Why not?
30
Performance is hard.
31
A good pilot makes it look
easy.
—Van R. Millsap
1936–2004
32
Performance is EASY
33
How?
34
It’s the
user’s
experience
that matters.
35
36
A user’s performance experience
consists of two elements…
37
1. a task
2. time
38
Task
39
The things we used to “computerize”…
tasks.
http://olathe.lib.ks.us/images/Image/Computer%20User.jpg
40
A task is a business unit of work.
• Post to the General Ledger
• Enter an order
• Look up a book by author
41
Tasks can nest.
Posting
PO AP AR … FA
42
Tasks can nest.
• Print Addresses is a task
Posting
PO AP AR … FA
42
Tasks can nest.
• Print Addresses is a task
• Print Address #42 is a
(sub)task
Posting
PO AP AR … FA
42
Tasks can nest.
• Print Addresses is a task
• Print Address #42 is a
(sub)task
Posting
PO AP AR … FA
42
Tasks can nest.
• Print Addresses is a task
• Print Address #42 is a
(sub)task
• Often, a program is a task
Posting
PO AP AR … FA
42
Tasks can nest.
• Print Addresses is a task
• Print Address #42 is a
(sub)task
• Often, a program is a task
• Often, a tiny part of a Posting
program is a task
PO AP AR … FA
42
it.
Tasks are
Business people don’t care
about the “system” except
through execution of the tasks
that make up their business.
43
it.
Tasks are
Tasks are what
system owners care
about.
44
Time
45
time.
Performance is about
46
How fast: “Daddy, can your car go 500
miles?”
He meant “500 miles per hour.”
To talk about performance (speed), you have to talk about
time.
47
Two ways to measure
performance…
48
49
tasks per time
49
tasks per time
(that’s throughput)
49
tasks per time
(that’s throughput)
49
tasks per time
(that’s throughput)
time per task
49
tasks per time
(that’s throughput)
time per task
(that’s response time)
49
Throughput and response time…
50
Throughput and response time…
• Throughput (X)
– The tasks-per-time way
– Number of task executions completed in a given duration
• “orders/second”
50
Throughput and response time…
• Throughput (X)
– The tasks-per-time way
– Number of task executions completed in a given duration
• “orders/second”
50
Throughput and response time…
• Throughput (X)
– The tasks-per-time way
– Number of task executions completed in a given duration
• “orders/second”
• Response time (R)
– The time-per-task way
– Elapsed duration of an execution of a given task
• “seconds/order”
50
51
X = 1/R
51
X = 1/R
51
X = 1/R
(kind of)
51
Average throughput is the inverse of average response
time.
52
Average throughput is the inverse of average response
time.
X = 1,000 txn/sec?
52
Average throughput is the inverse of average response
time.
X = 1,000 txn/sec?
Then R = (1 sec)/(1,000 txn) = .001 sec/txn
But…
52
53
…Adding load to create
higher throughput
changes response time.
53
…Which leads to a whole ’nother conversation I’d love
to have with you some other time.
54
Sequence Diagram
55
A simple way to view response time is with
a UML sequence diagram.
RA
http://www.websequencediagrams.com 56
More complicated systems have nested levels of
suppliers and consumers.
RA RB
http://www.websequencediagrams.com 57
The tiers represent the way your system is
constructed.
RUser
http://www.websequencediagrams.com 58
This sequence diagram shows the complicated
interactions among consumers and suppliers.
RUser
http://www.websequencediagrams.com 59
The sequence diagram is a
conceptual
good tool.
60
But when you need to analyze thousands of calls,
you need something else.
61
Profile
62
A profile is a complete account of a task’s response
time.
Response time # Calls R/call Call name
(seconds) (seconds)
0.769 50.3% 5,003 0.000154 unaccounted-for between
dbcalls
0.393 25.7% 5,010 0.000078 SQL*Net message from client
0.381 24.9% 5,013 0.000076 CPU service, execute calls
0.090 5.9% 11 0.008194 CPU service, prepare calls
0.027 1.8% 1 0.027396 log file sync
0.008 0.5% 5,010 0.000002 SQL*Net message to client
0.000 0.0% 9 0.000000 CPU service, fetch calls
–0.138 –9.1%
5,031 –0.000028 unaccounted-for within
dbcalls
1.530 100.0% Total
63
Profile
• Full account of response time • Contributions as %R
– Spanning (sum ≮ R) • Duration per call
Mean, minimum, maximum, …
– Non-overlapping (sum ≯ R)
Skew
• Sorted by descending R
• Drill-down
• Useful dimension
Individual call level of detail
– Flat profile
Maybe even deeper
– Call graph
65
Response Time
66
To optimize throughput, you
response
must analyze
time.
67
(Proof)
68
(Proof)
You cannot optimize X for a task that’s ineficient.
68
(Proof)
You cannot optimize X for a task that’s ineficient.
68
(Proof)
You cannot optimize X for a task that’s ineficient.
You cannot measure a task’s eficiency without measuring
its R.
68
(Proof)
You cannot optimize X for a task that’s ineficient.
You cannot measure a task’s eficiency without measuring
its R.
68
(Proof)
You cannot optimize X for a task that’s ineficient.
You cannot measure a task’s eficiency without measuring
its R.
Therefore, to optimize X, you must first analyze R.
68
The universal experience of
programmers who have been
using measurement tools has
been that their intuitive
guesses fail.
—Donald Knuth
69
(Programmers aren’t very good at
guessing where their code spends time.)
70
To optimize performance (throughput or response time),
profiles.
need
people
71
Performance is EASY
72
Performance is easy if you can
stop guessing where your code is
slow.
73
When you have profiles for task
response times, performance
cannot hide
problems
from you.
74
Some surprising things I’ve
learned by measuring R…
75
Disk I/O is often less
important
than people think.
http://carymillsap.blogspot.com/2009/04/cary-on-joel-on-ssd.html
76
Common performance problems:
77
Common performance problems:
CPU
77
Common performance problems:
CPU
77
Common performance problems:
CPU
Network I/O
77
Common performance problems:
CPU
Network I/O
77
Common performance problems:
CPU
Network I/O
Software serialization
77
The point…
78
Your problems have nothing to
do with experiences I’ve had.
measure.
So
79
Finding what you
need to see
80
How are you supposed to
profiles?
create these
81
You have to insist on seeing
where time goes for any task
you think is important.
82
To drill down, you need
call-by-call data.
(NOT data about aggregations of calls.)
83
In Oracle, we do it with a feature called extended SQL
tracing.
• For Developers: Making
Friends with the Oracle
Database for Fast, Scalable
Applications
– Cary Millsap
http://method-r.com/downloads/doc_details/10-for-
developers-making-friends-with-the-oracle-
database-cary-millsap
• Optimizing Oracle
Performance
– Cary Millsap with Je Holt
84
The stu you need…
85
Feature (attribute) Oracle MySQL App tier
Task identification y
Call-by-call coverage 98%+
DB call begin sequence partly derivable
DB call begin time partly derivable
DB call end time y
DB call context info y
OS call begin sequence partly derivable
OS call begin time derivable
OS call end time y
OS call context info y
Call SQL context y
Call CPU (sys mode) -
Call CPU (usr mode) -
Call CPU (total) y
SQL execution plans y
86
Recap
87
Here’s what I hope
you take away today…
88
Performance is about
time and tasks.
89
If you’re interested in performance, then
read Goldratt’s The Goal.
90
91
Don’t guess; you’re probably wrong.
91
Don’t guess; you’re probably wrong.
Measure response time
before you optimize anything.
91
Don’t guess; you’re probably wrong.
Measure response time
before you optimize anything.
Insist on it.
91
Performance is easy
(and fun!)
when code measures its own
time and tasks.
92
0 comments
Post a comment