Oracle Pe ormance
Cary Millsap
Method R Corporation and Accenture Enkitec Group
@CaryMillsap
cary.millsap@method-r.com · cary.millsap@enkitec.com
Dallas Oracle Users Group Meeting
Texas State Government Facility whose name must not be spoken
Richardson, Texas
5:00p–11:00p CST Thursday 22 January 2015
© 2015 Method R Corporation
@CaryMillsap
Cary Millsap
2020
2015
2010
2005
2000
1995
1990
1985
100 45 4
??
TM
MeTHOD R
TM
hotsos
Optimal Flexible Architecture
Oracle APS
System Pe ormance Group
Method R Profiler
Method R Tools
Method R Trace
@CaryMillsap 3
h p://amzn.to/173bpzg
@CaryMillsap 4
1 What I’ll talk about today
@CaryMillsap
When you execute a business task on a computer
system, you create an experience.
…An experience between you and the machine.
5
@CaryMillsap
The duration of this experience is called
response time.
6
@CaryMillsap
A sequence diagram helps you understand how
that response time was consumed.
7
@CaryMillsap
A profile is a useful aggregation
of the sequence diagram.
8
@CaryMillsap
A pe ormance analyst looks at your time
consumptions to determine whether it is possible
to reduce the response time of the experience.
…And, if so, then by how much.
9
@CaryMillsap
The richest and easiest diagnostic information to
obtain in this whole technology stack is available
from the Oracle Database tier.
…Oracle’s extended SQL trace data.
10
@CaryMillsap
But in almost 100% of first tries with using Oracle
extended SQL trace data, people make a data
collection mistake that complicates their analysis.
11
@CaryMillsap
This is the story of that mistake.
12
@CaryMillsap 13
2 Sequence diagrams and profiles
@CaryMillsap 14
@CaryMillsap 15
@CaryMillsap 16
@CaryMillsap 17
@CaryMillsap 18
CALL-NAME DURATION % CALLS MEAN
------------------------------------------------- ------------- ------ ---------- ---------
db file sequential read 59,081.406102 76.6% 10,013,394 0.005900
log buffer space 6,308.758563 8.2% 9,476 0.665762
free buffer waits 4,688.730190 6.1% 200,198 0.023420
EXEC 4,214.190000 5.5% 36,987 0.113937
log file switch completion 1,552.471890 2.0% 1,853 0.837815
db file parallel read 464.976815 0.6% 7,641 0.060853
log file switch (checkpoint incomplete) 316.968886 0.4% 351 0.903045
rdbms ipc reply 244.937910 0.3% 2,737 0.089491
undo segment extension 140.267429 0.2% 1,411 0.099410
log file switch (private strand flush incomplete) 112.680587 0.1% 134 0.840900
17 others 23.367228 0.0% 58,126 0.000402
------------------------------------------------- ------------- ------ ---------- ---------
TOTAL (27) 77,148.755600 100.0% 10,332,308 0.007467
CALL-NAME DURATION % CALLS MEAN
------------------------------------------------- ------------- ------ ---------- ---------
db file sequential read 59,081.406102 76.6% 10,013,394 0.005900
log buffer space 6,308.758563 8.2% 9,476 0.665762
free buffer waits 4,688.730190 6.1% 200,198 0.023420
EXEC 4,214.190000 5.5% 36,987 0.113937
log file switch completion 1,552.471890 2.0% 1,853 0.837815
db file parallel read 464.976815 0.6% 7,641 0.060853
log file switch (checkpoint incomplete) 316.968886 0.4% 351 0.903045
rdbms ipc reply 244.937910 0.3% 2,737 0.089491
undo segment extension 140.267429 0.2% 1,411 0.099410
log file switch (private strand flush incomplete) 112.680587 0.1% 134 0.840900
17 others 23.367228 0.0% 58,126 0.000402
------------------------------------------------- ------------- ------ ---------- ---------
TOTAL (27) 77,148.755600 100.0% 10,332,308 0.007467
=
@CaryMillsap 19
@CaryMillsap
Key question:
What is it you’re trying to optimize?
20
@CaryMillsap
Slow depa ment?
Then analyze the steam of experiences.
Verdict: clerk too slow between experiences.
21
=
CALL-NAME DURATION % CALLS MEAN
--------------------------- -------- ------ ----- ---------
SQL*Net message from client 137 87.3% 7 19.571429
everything else 20 12.7% 142 0.140845
--------------------------- -------- ------ ----- ---------
TOTAL (2) 157 100.0% 149 1.053691
@CaryMillsap
THE ONE YOU’LL BE DOING
MOST OF THE TIME.
Slow application?
Then analyze each experience separately.
Verdict: app is too cha y.
22
CALL-NAME DURATION % CALLS MEAN
--------------------------- -------- ------ ----- --------
SQL*Net message from client 8 57.1% 4 2.000000
some of this, some of that 6 42.9% 28 0.214286
--------------------------- -------- ------ ----- --------
TOTAL (2) 14 100.0% 32 0.382550
CALL-NAME DURATION % CALLS MEAN
--------------------------- -------- ------ ----- --------
SQL*Net message from client 11 52.3% 4 2.750000
some of this, some of that 10 47.7% 113 0.088496
--------------------------- -------- ------ ----- --------
TOTAL (1) 21 100.0% 117 0.382550
=
@CaryMillsap 23
3 Oracle extended SQL trace data
@CaryMillsap 24
@CaryMillsap
Typical Oracle trace file for a connection pool
25
...
WAIT ... nam='SQL*Net message from client' ela= 1202689 ...
A sequence of trace lines explaining time consumption for Experience A
WAIT ... nam='SQL*Net message from client' ela= 4260917 ...
A sequence of trace lines explaining time consumption for Experience B
WAIT ... nam='SQL*Net message from client' ela= 5213365 ...
A sequence of trace lines explaining time consumption for Experience C
WAIT ... nam='SQL*Net message from client' ela= 2044420 ...
...
@CaryMillsap
So, just ignore all the
SQL*Net message from client.
Right?
26
@CaryMillsap
So, just ignore all the
SQL*Net message from client.
Right?
27
BIG MISTAKE
@CaryMillsap
What I didn’t mention before…
28
@CaryMillsap
These experiences like A, B, and C can have
SQL*Net message from client calls in them, too.
…That might dominate response times!
29
@CaryMillsap
WAIT ... nam='SQL*Net message from client' ela= 1202689 ...
stuff for Experience A
WAIT ... nam='SQL*Net message from client' ela= 342
more stuff for Experience A
WAIT ... nam='SQL*Net message from client' ela= 1492
yet more stuff for experience A
etc.
WAIT ... nam='SQL*Net message from client' ela= 4260917 ...
stuff for Experience B
WAIT ... nam='SQL*Net message from client' ela= 2928
more stuff for Experience B
etc.
WAIT ... nam='SQL*Net message from client' ela= 5213365 ...
stuff for Experience C
WAIT ... nam='SQL*Net message from client' ela= 855
more stuff for Experience C
etc.
WAIT ... nam='SQL*Net message from client' ela= 2044420 ...
30
WAIT ... nam='SQL*Net message from client' ela= 1202689 ...
stuff for Experience A
WAIT ... nam='SQL*Net message from client' ela= 342
more stuff for Experience A
WAIT ... nam='SQL*Net message from client' ela= 1492
yet more stuff for experience A
etc.
WAIT ... nam='SQL*Net message from client' ela= 4260917 ...
stuff for Experience B
WAIT ... nam='SQL*Net message from client' ela= 2928
more stuff for Experience B
etc.
WAIT ... nam='SQL*Net message from client' ela= 5213365 ...
stuff for Experience C
WAIT ... nam='SQL*Net message from client' ela= 855
more stuff for Experience C
etc.
WAIT ... nam='SQL*Net message from client' ela= 2044420 ...
@CaryMillsap
It’s actually a common pa ern.
Behold the network abusing, cha y app…
31
CALL-NAME DURATION % CALLS MEAN MIN MAX
--------------------------- ---------- ------ ------- -------- -------- --------
SQL*Net message from client 200.939935 99.5% 142,520 0.001410 0.000937 0.202835
SQL*Net message to client 0.526257 0.3% 142,520 0.000004 0.000000 0.000130
FETCH 0.439933 0.2% 142,518 0.000003 0.000000 0.001000
PARSE 0.000000 0.0% 2 0.000000 0.000000 0.000000
EXEC 0.000000 0.0% 2 0.000000 0.000000 0.000000
--------------------------- ---------- ------ ------- -------- -------- --------
TOTAL (5) 201.906125 100.0% 427,562 0.000472 0.000000 0.202835
@CaryMillsap
CALL-NAME DURATION % CALLS MEAN MIN MAX
--------------------------- ---------- ------ ------- -------- -------- --------
SQL*Net message from client 0.911041 36.5% 72 0.012653 0.000890 0.026857
SQL*Net more data to client 0.841897 33.7% 2,688 0.000313 0.000004 0.013287
FETCH 0.744885 29.8% 70 0.010641 0.006999 0.012998
PARSE 0.001000 0.0% 2 0.000500 0.000000 0.001000
SQL*Net message to client 0.000147 0.0% 72 0.000002 0.000001 0.000006
EXEC 0.000000 0.0% 2 0.000000 0.000000 0.000000
--------------------------- ---------- ------ ------- -------- -------- --------
TOTAL (6) 2.498970 100.0% 2,906 0.000860 0.000000 0.026857
It’s actually a common pa ern.
…and the way it should behave.
32
@CaryMillsap 33
4 Oceans, islands, rivers
@CaryMillsap 34
Such trace files have
islands of activity
in an ocean of idleness.
@CaryMillsap 35
But…
@CaryMillsap 36
An island can have rivers.
@CaryMillsap
@CaryMillsap
So can a trace file.
WAIT ... nam='SQL*Net message from client' ela= 1202689 ...
stuff for Experience A
WAIT ... nam='SQL*Net message from client' ela= 342
more stuff for Experience A
WAIT ... nam='SQL*Net message from client' ela= 1492
yet more stuff for experience A
etc.
WAIT ... nam='SQL*Net message from client' ela= 4260917 ...
stuff for Experience B
WAIT ... nam='SQL*Net message from client' ela= 2928
more stuff for Experience B
etc.
WAIT ... nam='SQL*Net message from client' ela= 5213365 ...
stuff for Experience C
WAIT ... nam='SQL*Net message from client' ela= 855
more stuff for Experience C
etc.
WAIT ... nam='SQL*Net message from client' ela= 2044420 ...
37
@CaryMillsap 38
5 How to cope with the problem
@CaryMillsap 39
…the problem?
@CaryMillsap
Trace file with oceans.
Find the 2.3-sec
experience.
40
CALL-NAME DURATION % CALLS MEAN MIN MAX
--------------------------- --------- ------ ------ -------- -------- ---------
SQL*Net message from client 31.018640 99.3% 10,003 0.003101 0.000023 20.121507
direct path read 0.110575 0.4% 10,000 0.000011 0.000004 0.020533
FETCH 0.081993 0.3% 5,001 0.000016 0.000000 0.001000
SQL*Net message to client 0.008804 0.0% 10,003 0.000001 0.000000 0.000061
PARSE 0.003999 0.0% 2 0.001999 0.000000 0.003999
EXEC 0.001000 0.0% 2 0.000500 0.000000 0.001000
CLOSE 0.000000 0.0% 2 0.000000 0.000000 0.000000
--------------------------- --------- ------ ------ -------- -------- ---------
TOTAL (7) 31.225011 100.0% 35,013 0.000892 0.000000 20.121507
What percentage of this 2.3-sec experience is rivers?
@CaryMillsap 41
CALL-NAME DURATION % CALLS MEAN MIN MAX
--------------------------- --------- ------ ------ -------- -------- ---------
SQL*Net message from client 31.018640 99.3% 10,003 0.003101 0.000023 20.121507
direct path read 0.110575 0.4% 10,000 0.000011 0.000004 0.020533
FETCH 0.081993 0.3% 5,001 0.000016 0.000000 0.001000
SQL*Net message to client 0.008804 0.0% 10,003 0.000001 0.000000 0.000061
PARSE 0.003999 0.0% 2 0.001999 0.000000 0.003999
EXEC 0.001000 0.0% 2 0.000500 0.000000 0.001000
CLOSE 0.000000 0.0% 2 0.000000 0.000000 0.000000
--------------------------- --------- ------ ------ -------- -------- ---------
TOTAL (7) 31.225011 100.0% 35,013 0.000892 0.000000 20.121507
CALL-NAME DURATION % CALLS MEAN MIN MAX
--------------------------- --------- ------ ------ -------- -------- ---------
direct path read 0.110575 53.6% 10,000 0.000011 0.000004 0.020533
FETCH 0.081993 39.7% 5,001 0.000016 0.000000 0.001000
SQL*Net message to client 0.008804 4.3% 10,003 0.000001 0.000000 0.000061
PARSE 0.003999 1.9% 2 0.001999 0.000000 0.003999
EXEC 0.001000 0.5% 2 0.000500 0.000000 0.001000
CLOSE 0.000000 0.0% 2 0.000000 0.000000 0.000000
--------------------------- --------- ------ ------ -------- -------- ---------
TOTAL (6) 0.206371 100.0% 25,010 0.000008 0.000000 0.020533
Trace file with no
water at all.
Doesn’t explain the
2.3-sec experience.
What percentage of this 2.3-sec experience is rivers?
Trace file with oceans.
Find the 2.3-sec
experience.
@CaryMillsap 42
CALL-NAME DURATION % CALLS MEAN MIN MAX
--------------------------- --------- ------ ------ -------- -------- ---------
SQL*Net message from client 31.018640 99.3% 10,003 0.003101 0.000023 20.121507
direct path read 0.110575 0.4% 10,000 0.000011 0.000004 0.020533
FETCH 0.081993 0.3% 5,001 0.000016 0.000000 0.001000
SQL*Net message to client 0.008804 0.0% 10,003 0.000001 0.000000 0.000061
PARSE 0.003999 0.0% 2 0.001999 0.000000 0.003999
EXEC 0.001000 0.0% 2 0.000500 0.000000 0.001000
CLOSE 0.000000 0.0% 2 0.000000 0.000000 0.000000
--------------------------- --------- ------ ------ -------- -------- ---------
TOTAL (7) 31.225011 100.0% 35,013 0.000892 0.000000 20.121507
CALL-NAME DURATION % CALLS MEAN MIN MAX
--------------------------- --------- ------ ------ -------- -------- ---------
direct path read 0.110575 53.6% 10,000 0.000011 0.000004 0.020533
FETCH 0.081993 39.7% 5,001 0.000016 0.000000 0.001000
SQL*Net message to client 0.008804 4.3% 10,003 0.000001 0.000000 0.000061
PARSE 0.003999 1.9% 2 0.001999 0.000000 0.003999
EXEC 0.001000 0.5% 2 0.000500 0.000000 0.001000
CLOSE 0.000000 0.0% 2 0.000000 0.000000 0.000000
--------------------------- --------- ------ ------ -------- -------- ---------
TOTAL (6) 0.206371 100.0% 25,010 0.000008 0.000000 0.020533
CALL-NAME DURATION % CALLS MEAN MIN MAX
--------------------------- --------- ------ ------ -------- -------- ---------
SQL*Net message from client 2.072877 90.9% 10,001 0.000207 0.000023 0.016861
direct path read 0.110575 4.9% 10,000 0.000011 0.000004 0.020533
FETCH 0.081993 3.6% 5,001 0.000016 0.000000 0.001000
SQL*Net message to client 0.008804 0.4% 10,003 0.000001 0.000000 0.000061
PARSE 0.003999 0.2% 2 0.001999 0.000000 0.003999
EXEC 0.001000 0.0% 2 0.000500 0.000000 0.001000
CLOSE 0.000000 0.0% 2 0.000000 0.000000 0.000000
--------------------------- --------- ------ ------ -------- -------- ---------
TOTAL (7) 2.279248 100.0% 35,011 0.000065 0.000000 0.020533
Trace file with no
water at all.
Doesn’t explain the
2.3-sec experience.
Trace file with rivers,
but no oceans.
Explains the 2.3-sec
experience exactly.
90.9% is rivers. Easy.
Trace file with oceans.
Find the 2.3-sec
experience.
@CaryMillsap
To Oracle, it’s all just water.
It sees no difference between salt water and fresh water,
between response-time SNMFC and non-response-time SNMFC.
It’s all just SQL*Net message from client.
43
WAIT ... nam='SQL*Net message from client' ela= 1202689 ...
stuff for Experience A
WAIT ... nam='SQL*Net message from client' ela= 342
more stuff for Experience A
WAIT ... nam='SQL*Net message from client' ela= 1492
yet more stuff for experience A
etc.
WAIT ... nam='SQL*Net message from client' ela= 4260917 ...
stuff for Experience B
WAIT ... nam='SQL*Net message from client' ela= 2928
more stuff for Experience B
etc.
WAIT ... nam='SQL*Net message from client' ela= 5213365 ...
stuff for Experience C
WAIT ... nam='SQL*Net message from client' ela= 855
more stuff for Experience C
etc.
WAIT ... nam='SQL*Net message from client' ela= 2044420 ...
@CaryMillsap
However, there are
big SQL*Net
message from
client calls
…and li le SQL*Net
message from
client calls.
This is a clue.
44
@CaryMillsap
?SQL*Net message from client
if ela ≥ 1.00 sec then ocean (not response time)
otherwise river (response time)
45
@CaryMillsap
It actually works pre y well.
46
@CaryMillsap
rivers
rivers
ocean
47
@CaryMillsap
SQL*Net message from client
if ela ≥ :b then ocean (not response time)
otherwise river (response time)
Sometimes you have to fine-tune the boundary value.
48
@CaryMillsap 49
$ mrskew --rc=txnz.05 v11203_ora_26827.trc
EXP-ID   DURATION       %   CALLS      MEAN       MIN       MAX
-----------  ---------  ------  ------  --------  --------  --------
0  24.236626   27.5%     327  0.074118  0.050007  0.283979
19547   2.212251    2.5%     807  0.002741  0.000000  0.049582
27247   2.112561    2.4%     791  0.002671  0.000000  0.048360
24221   1.927336    2.2%     267  0.007218  0.000000  0.048210
16129   1.450686    1.6%     683  0.002124  0.000000  0.049147
22289   0.997744    1.1%     643  0.001552  0.000000  0.045547
29620   0.982700    1.1%     562  0.001749  0.000000  0.049281
2843   0.967385    1.1%     655  0.001477  0.000000  0.048986
33239   0.920264    1.0%     139  0.006621  0.000000  0.047733
23031   0.917492    1.0%     647  0.001418  0.000000  0.049615
17091   0.899165    1.0%     579  0.001553  0.000000  0.045020
14701   0.864747    1.0%     123  0.007030  0.000000  0.049502
6509   0.805075    0.9%     437  0.001842  0.000000  0.043662
653   0.780152    0.9%     403  0.001936  0.000000  0.048553
36583   0.773713    0.9%     484  0.001599  0.000000  0.030175
26287   0.767064    0.9%     619  0.001239  0.000000  0.038591
2333   0.750920    0.9%     103  0.007290  0.000000  0.045808
9685   0.720571    0.8%     479  0.001504  0.000000  0.047614
25107   0.718329    0.8%     115  0.006246  0.000000  0.043572
28487   0.715467    0.8%     107  0.006687  0.000000  0.048749
309 others  43.717756   49.5%   8,389  0.005211  0.000000  0.049996
-----------  ---------  ------  ------  --------  --------  --------
TOTAL (329)  88.238004  100.0%  17,359  0.005083  0.000000  0.283979
@CaryMillsap 50
6 How to fix the problem
@CaryMillsap
For connection pooling apps, the
oceans-islands-rivers thing works pre y well.
51
But it’s not 100% reliable.
For example, what if you have a river that’s bigger than one of your oceans?
@CaryMillsap
If you know your app, you know where the
experience boundaries are.
52
@CaryMillsap
If you can instrument your app,
it will automatically tell you
where the experience boundaries are.
53
@CaryMillsap
If you’re running code in an interactive
development environment, it’s easy:
54
1. activate trace;
2.
execute the code path

 for the experience;
3. deactivate trace;
@CaryMillsap
If you’re running code in an interactive
development environment, it’s easy:
55
1. activate trace;
1.1. There must be NO LATENCY here.
2.
execute the code path

 for the experience;
2.1. There must be NO LATENCY here.
3. deactivate trace;
@CaryMillsap 56
@CaryMillsap
This is the best thing you can do:
Instrument your application so that the
trace data explains exactly one user
response time experience.
57
@CaryMillsap
You can fix a trace file that accounts
for more time than you want.
58
…E.g., if you’re stuck activating trace with dbms_monitor.session_trace_enable(:sid,:serial,true,true).
@CaryMillsap
But fixing a trace file requires either
a) effo
b) tools
59
I’ll show you both.
@CaryMillsap
Two types of trace file scoping problems:
1. Unwanted calls at the bo om
2. Unwanted calls at the top or middle
60
@CaryMillsap
You can cut the bo om off a trace file.
…With, say, vi. No problem.
61
1
@CaryMillsap 62
...
WAIT #0: nam='direct path read' ela= 7 file number=4 first dba=4665 block cnt=1
obj#=86815 tim=1313696204681916
WAIT #0: nam='direct path read' ela= 5 file number=4 first dba=4665 block cnt=1
obj#=86815 tim=1313696204681942
WAIT #0: nam='SQL*Net message to client' ela= 1 driver id=1650815232 #bytes=1 p3=0
obj#=86815 tim=1313696204681955
WAIT #0: nam='SQL*Net message from client' ela= 141 driver id=1650815232 #bytes=1 p3=0
obj#=86815 tim=1313696204682115
FETCH #5:c=0,e=5,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=1601196873,tim=1313696204682136
STAT #5 id=1 cnt=5000 pid=0 pos=1 obj=86814 op='TABLE ACCESS FULL T (cr=5003 pr=0 pw=0
time=23585 us cost=11 size=10075000 card=5000)'
WAIT #5: nam='SQL*Net message to client' ela= 1 driver id=1650815232 #bytes=1 p3=0
obj#=86815 tim=1313696204682254
*** 2011-08-18 14:36:53.506
WAIT #5: nam='SQL*Net message from client' ela= 8824256 driver id=1650815232 #bytes=1
p3=0 obj#=86815 tim=1313696213506522
CLOSE #5:c=0,e=22,dep=0,type=0,tim=1313696213506643
=====================
PARSING IN CURSOR #2 len=55 dep=0 uid=84 oct=42 lid=84 tim=1313696213506753
hv=2217940283 ad='0' sqlid='06nvwn223659v'
alter session set events '10046 trace name context off'
END OF STMT
PARSE #2:c=0,e=70,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=0,plh=0,tim=1313696213506752
EXEC #2:c=1000,e=354,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=0,plh=0,tim=1313696213507146
␄
@CaryMillsap 63
...
WAIT #0: nam='direct path read' ela= 7 file number=4 first dba=4665 block cnt=1
obj#=86815 tim=1313696204681916
WAIT #0: nam='direct path read' ela= 5 file number=4 first dba=4665 block cnt=1
obj#=86815 tim=1313696204681942
WAIT #0: nam='SQL*Net message to client' ela= 1 driver id=1650815232 #bytes=1 p3=0
obj#=86815 tim=1313696204681955
WAIT #0: nam='SQL*Net message from client' ela= 141 driver id=1650815232 #bytes=1 p3=0
obj#=86815 tim=1313696204682115
FETCH #5:c=0,e=5,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=1601196873,tim=1313696204682136
STAT #5 id=1 cnt=5000 pid=0 pos=1 obj=86814 op='TABLE ACCESS FULL T (cr=5003 pr=0 pw=0
time=23585 us cost=11 size=10075000 card=5000)'
WAIT #5: nam='SQL*Net message to client' ela= 1 driver id=1650815232 #bytes=1 p3=0
obj#=86815 tim=1313696204682254
# *** 2011-08-18 14:36:53.506
# WAIT #5: nam='SQL*Net message from client' ela= 8824256 driver id=1650815232 #bytes=1
p3=0 obj#=86815 tim=1313696213506522
# CLOSE #5:c=0,e=22,dep=0,type=0,tim=1313696213506643
# =====================
# PARSING IN CURSOR #2 len=55 dep=0 uid=84 oct=42 lid=84 tim=1313696213506753
hv=2217940283 ad='0' sqlid='06nvwn223659v'
# alter session set events '10046 trace name context off'
# END OF STMT
# PARSE #2:c=0,e=70,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=0,plh=0,tim=1313696213506752
# EXEC #2:c=1000,e=354,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=0,plh=0,tim=1313696213507146
␄
@CaryMillsap
But cu ing calls out of the
bo om of a trace file is almost
never going to be enough.
64
@CaryMillsap
Cu ing calls out of the middle
or the top requires magic.
65
2
@CaryMillsap 66
*** 2011-08-18 14:36:21.576
*** SESSION ID:(23.42) 2011-08-18 14:36:21.576
*** CLIENT ID:() 2011-08-18 14:36:21.576
*** SERVICE NAME:(SYS$USERS) 2011-08-18 14:36:21.576
*** MODULE NAME:(SQL*Plus) 2011-08-18 14:36:21.576
*** ACTION NAME:() 2011-08-18 14:36:21.576
WAIT #8: nam='SQL*Net message to client' ela= 1 driver id=1650815232 #bytes=1
p3=0 obj#=-1 tim=1313696181576631
*** 2011-08-18 14:36:41.698
WAIT #8: nam='SQL*Net message from client' ela= 20121507 driver id=1650815232
#bytes=1 p3=0 obj#=-1 tim=1313696201698518
CLOSE #8:c=0,e=41,dep=0,type=1,tim=1313696201698681
=====================
PARSING IN CURSOR #7 len=352 dep=1 uid=84 oct=3 lid=84 tim=1313696201699956
hv=2904344320 ad='3e4f6d48' sqlid='f70vdzaqjtjs0'
SELECT /* OPT_DYN_SAMP */ /*+ ALL_ROWS IGNORE_WHERE_CLAUSE NO_PARALLEL(SAMPLESUB)
opt_param('parallel_execution_enabled', 'false') NO_PARALLEL_INDEX(SAMPLESUB)
NO_SQL_TUNE */ NVL(SUM(C1),:"SYS_B_0"), NVL(SUM(C2),:"SYS_B_1") FROM (SELECT /
*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ :"SYS_B_2" AS
C1, :"SYS_B_3" AS C2 FROM "T" "T") SAMPLESUB
END OF STMT
PARSE #7:c=0,e=402,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=1,plh=0,tim=1313696201699946
...
@CaryMillsap 67
*** 2011-08-18 14:36:21.576
*** SESSION ID:(23.42) 2011-08-18 14:36:21.576
*** CLIENT ID:() 2011-08-18 14:36:21.576
*** SERVICE NAME:(SYS$USERS) 2011-08-18 14:36:21.576
*** MODULE NAME:(SQL*Plus) 2011-08-18 14:36:21.576
*** ACTION NAME:() 2011-08-18 14:36:21.576
WAIT #8: nam='SQL*Net message to client' ela= 1 driver id=1650815232 #bytes=1
p3=0 obj#=-1 tim=1313696181576631
*** 2011-08-18 14:36:41.698
WAIT #8: nam='SQL*Net message from client' ela= 20121507 driver id=1650815232
#bytes=1 p3=0 obj#=-1 tim=1313696201698518
CLOSE #8:c=0,e=41,dep=0,type=1,tim=1313696201698681
=====================
PARSING IN CURSOR #7 len=352 dep=1 uid=84 oct=3 lid=84 tim=1313696201699956
hv=2904344320 ad='3e4f6d48' sqlid='f70vdzaqjtjs0'
SELECT /* OPT_DYN_SAMP */ /*+ ALL_ROWS IGNORE_WHERE_CLAUSE NO_PARALLEL(SAMPLESUB)
opt_param('parallel_execution_enabled', 'false') NO_PARALLEL_INDEX(SAMPLESUB)
NO_SQL_TUNE */ NVL(SUM(C1),:"SYS_B_0"), NVL(SUM(C2),:"SYS_B_1") FROM (SELECT /
*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ :"SYS_B_2" AS
C1, :"SYS_B_3" AS C2 FROM "T" "T") SAMPLESUB
END OF STMT
PARSE #7:c=0,e=402,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=1,plh=0,tim=1313696201699946
...
If you delete this line, then its 20.121507-
second contribution to the 20.122009
seconds between calls will be unexplained.
(1,313,696,201.698681 – 41) – 1,313,696,181.576631 = 20.122009
@CaryMillsap 68
*** 2011-08-18 14:36:21.576
*** SESSION ID:(23.42) 2011-08-18 14:36:21.576
*** CLIENT ID:() 2011-08-18 14:36:21.576
*** SERVICE NAME:(SYS$USERS) 2011-08-18 14:36:21.576
*** MODULE NAME:(SQL*Plus) 2011-08-18 14:36:21.576
*** ACTION NAME:() 2011-08-18 14:36:21.576
WAIT #8: nam='SQL*Net message to client' ela= 1 driver id=1650815232 #bytes=1
p3=0 obj#=-1 tim=1313696181576631
*** 2011-08-18 14:36:41.698
WAIT #8: nam='SQL*Net message from client' ela= 20121507 driver id=1650815232
#bytes=1 p3=0 obj#=-1 tim=1313696201698518
CLOSE #8:c=0,e=41,dep=0,type=1,tim=1313696201698681
=====================
PARSING IN CURSOR #7 len=352 dep=1 uid=84 oct=3 lid=84 tim=1313696201699956
hv=2904344320 ad='3e4f6d48' sqlid='f70vdzaqjtjs0'
SELECT /* OPT_DYN_SAMP */ /*+ ALL_ROWS IGNORE_WHERE_CLAUSE NO_PARALLEL(SAMPLESUB)
opt_param('parallel_execution_enabled', 'false') NO_PARALLEL_INDEX(SAMPLESUB)
NO_SQL_TUNE */ NVL(SUM(C1),:"SYS_B_0"), NVL(SUM(C2),:"SYS_B_1") FROM (SELECT /
*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ :"SYS_B_2" AS
C1, :"SYS_B_3" AS C2 FROM "T" "T") SAMPLESUB
END OF STMT
PARSE #7:c=0,e=402,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=1,plh=0,tim=1313696201699946
...
You can’t just delete this line (or set its ela
value to 0). You must also subtract
20.121507 seconds from every *** line and
tim value from there to the end of the file.
@CaryMillsap
Cu ing the top is just like cu ing the middle,
because of the *** lines.
69
@CaryMillsap 70
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
ORACLE_HOME = /app/oracle/product/11.2.0/db_1
System name: Linux
Node name:local-orcl
Release: 2.6.18-194.el5
Version: #1 SMP Mon Mar 29 20:06:41 EDT 2010
Machine: i686
Instance name: yyz
Redo thread mounted by this instance: 1
Oracle process number: 25
Unix process pid: 10358, image: oracle@local-orcl (TNS V1-V3)
*** 2011-08-18 14:36:21.576
*** SESSION ID:(23.42) 2011-08-18 14:36:21.576
*** CLIENT ID:() 2011-08-18 14:36:21.576
*** SERVICE NAME:(SYS$USERS) 2011-08-18 14:36:21.576
*** MODULE NAME:(SQL*Plus) 2011-08-18 14:36:21.576
*** ACTION NAME:() 2011-08-18 14:36:21.576
WAIT #8: nam='SQL*Net message to client' ela= 1 driver id=1650815232 #bytes=1
p3=0 obj#=-1 tim=1313696181576631
*** 2011-08-18 14:36:41.698
WAIT #8: nam='SQL*Net message from client' ela= 20121507 driver id=1650815232
#bytes=1 p3=0 obj#=-1 tim=1313696201698518
CLOSE #8:c=0,e=41,dep=0,type=1,tim=1313696201698681
@CaryMillsap
$ mrcallrm --lines=26,35078 yyz_ora_10358.trc > yyz_ora_10358-fixed.trc
71
@CaryMillsap 72
7 References
@CaryMillsap 73
References
h p://www.slideshare.net/carymillsap/how-to-find-and-fix-your
A free online presentation about how to instrument your application so it will automatically tell you
where the experience boundaries are.
h p://method-r.com/blogs/company-blog/214-finding-connection-pool-response-times-with-method-
r-tools
“Connection pool response times with Method R Tools (Oceans, Islands, and Rivers),” a blog post
explaining the oceans-islands-rivers metaphor.
h ps://motdcr3.eventbrite.com
“Mastering Oracle Trace Data free online class reunion,” to be held 11:00a–12:30p CST Thursday,
February 10, 2015.
h p://amzn.to/173bpzg
“The Method R Guide to Mastering Oracle Trace Data,” a textbook for the 1- to 2-day course that
covers Method R Corporation so ware and methods.
h p://method-r.com/so ware/m race
A Method R extension for Oracle SQL Developer. Method R Trace collects trace data and retrieves it
for you, automatically.
h p://method-r.com/so ware/m ools
A set of so ware tools for mining and manipulating Oracle extended SQL trace data. I use mrskew to
repo on durations of individual experiences recorded in extended SQL trace files. I use mrcallrm to
eliminate calls from my trace data. It automatically ripples the required tim and *** line changes
throughout a trace file.
h p://method-r.com/courses/mastering-oracle-trace-data
“Mastering Oracle Trace Data,” a 1- to 2-day course that covers Method R Corporation so ware and
methods.
@CaryMillsap 74
8 Your turn
@CaryMillsap 75
@CaryMillsap
www.enkitec.com
method-r.com
Thank you

Oracle trace data collection errors: the story about oceans, islands, and rivers

  • 1.
    Oracle Pe ormance CaryMillsap Method R Corporation and Accenture Enkitec Group @CaryMillsap cary.millsap@method-r.com · cary.millsap@enkitec.com Dallas Oracle Users Group Meeting Texas State Government Facility whose name must not be spoken Richardson, Texas 5:00p–11:00p CST Thursday 22 January 2015 © 2015 Method R Corporation
  • 2.
    @CaryMillsap Cary Millsap 2020 2015 2010 2005 2000 1995 1990 1985 100 454 ?? TM MeTHOD R TM hotsos Optimal Flexible Architecture Oracle APS System Pe ormance Group Method R Profiler Method R Tools Method R Trace
  • 3.
  • 4.
    @CaryMillsap 4 1 WhatI’ll talk about today
  • 5.
    @CaryMillsap When you executea business task on a computer system, you create an experience. …An experience between you and the machine. 5
  • 6.
    @CaryMillsap The duration ofthis experience is called response time. 6
  • 7.
    @CaryMillsap A sequence diagramhelps you understand how that response time was consumed. 7
  • 8.
    @CaryMillsap A profile isa useful aggregation of the sequence diagram. 8
  • 9.
    @CaryMillsap A pe ormanceanalyst looks at your time consumptions to determine whether it is possible to reduce the response time of the experience. …And, if so, then by how much. 9
  • 10.
    @CaryMillsap The richest andeasiest diagnostic information to obtain in this whole technology stack is available from the Oracle Database tier. …Oracle’s extended SQL trace data. 10
  • 11.
    @CaryMillsap But in almost100% of first tries with using Oracle extended SQL trace data, people make a data collection mistake that complicates their analysis. 11
  • 12.
    @CaryMillsap This is thestory of that mistake. 12
  • 13.
    @CaryMillsap 13 2 Sequencediagrams and profiles
  • 14.
  • 15.
  • 16.
  • 17.
  • 18.
    @CaryMillsap 18 CALL-NAME DURATION% CALLS MEAN ------------------------------------------------- ------------- ------ ---------- --------- db file sequential read 59,081.406102 76.6% 10,013,394 0.005900 log buffer space 6,308.758563 8.2% 9,476 0.665762 free buffer waits 4,688.730190 6.1% 200,198 0.023420 EXEC 4,214.190000 5.5% 36,987 0.113937 log file switch completion 1,552.471890 2.0% 1,853 0.837815 db file parallel read 464.976815 0.6% 7,641 0.060853 log file switch (checkpoint incomplete) 316.968886 0.4% 351 0.903045 rdbms ipc reply 244.937910 0.3% 2,737 0.089491 undo segment extension 140.267429 0.2% 1,411 0.099410 log file switch (private strand flush incomplete) 112.680587 0.1% 134 0.840900 17 others 23.367228 0.0% 58,126 0.000402 ------------------------------------------------- ------------- ------ ---------- --------- TOTAL (27) 77,148.755600 100.0% 10,332,308 0.007467 CALL-NAME DURATION % CALLS MEAN ------------------------------------------------- ------------- ------ ---------- --------- db file sequential read 59,081.406102 76.6% 10,013,394 0.005900 log buffer space 6,308.758563 8.2% 9,476 0.665762 free buffer waits 4,688.730190 6.1% 200,198 0.023420 EXEC 4,214.190000 5.5% 36,987 0.113937 log file switch completion 1,552.471890 2.0% 1,853 0.837815 db file parallel read 464.976815 0.6% 7,641 0.060853 log file switch (checkpoint incomplete) 316.968886 0.4% 351 0.903045 rdbms ipc reply 244.937910 0.3% 2,737 0.089491 undo segment extension 140.267429 0.2% 1,411 0.099410 log file switch (private strand flush incomplete) 112.680587 0.1% 134 0.840900 17 others 23.367228 0.0% 58,126 0.000402 ------------------------------------------------- ------------- ------ ---------- --------- TOTAL (27) 77,148.755600 100.0% 10,332,308 0.007467 =
  • 19.
  • 20.
    @CaryMillsap Key question: What isit you’re trying to optimize? 20
  • 21.
    @CaryMillsap Slow depa ment? Thenanalyze the steam of experiences. Verdict: clerk too slow between experiences. 21 = CALL-NAME DURATION % CALLS MEAN --------------------------- -------- ------ ----- --------- SQL*Net message from client 137 87.3% 7 19.571429 everything else 20 12.7% 142 0.140845 --------------------------- -------- ------ ----- --------- TOTAL (2) 157 100.0% 149 1.053691
  • 22.
    @CaryMillsap THE ONE YOU’LLBE DOING MOST OF THE TIME. Slow application? Then analyze each experience separately. Verdict: app is too cha y. 22 CALL-NAME DURATION % CALLS MEAN --------------------------- -------- ------ ----- -------- SQL*Net message from client 8 57.1% 4 2.000000 some of this, some of that 6 42.9% 28 0.214286 --------------------------- -------- ------ ----- -------- TOTAL (2) 14 100.0% 32 0.382550 CALL-NAME DURATION % CALLS MEAN --------------------------- -------- ------ ----- -------- SQL*Net message from client 11 52.3% 4 2.750000 some of this, some of that 10 47.7% 113 0.088496 --------------------------- -------- ------ ----- -------- TOTAL (1) 21 100.0% 117 0.382550 =
  • 23.
    @CaryMillsap 23 3 Oracleextended SQL trace data
  • 24.
  • 25.
    @CaryMillsap Typical Oracle tracefile for a connection pool 25 ... WAIT ... nam='SQL*Net message from client' ela= 1202689 ... A sequence of trace lines explaining time consumption for Experience A WAIT ... nam='SQL*Net message from client' ela= 4260917 ... A sequence of trace lines explaining time consumption for Experience B WAIT ... nam='SQL*Net message from client' ela= 5213365 ... A sequence of trace lines explaining time consumption for Experience C WAIT ... nam='SQL*Net message from client' ela= 2044420 ... ...
  • 26.
    @CaryMillsap So, just ignoreall the SQL*Net message from client. Right? 26
  • 27.
    @CaryMillsap So, just ignoreall the SQL*Net message from client. Right? 27 BIG MISTAKE
  • 28.
    @CaryMillsap What I didn’tmention before… 28
  • 29.
    @CaryMillsap These experiences likeA, B, and C can have SQL*Net message from client calls in them, too. …That might dominate response times! 29
  • 30.
    @CaryMillsap WAIT ... nam='SQL*Netmessage from client' ela= 1202689 ... stuff for Experience A WAIT ... nam='SQL*Net message from client' ela= 342 more stuff for Experience A WAIT ... nam='SQL*Net message from client' ela= 1492 yet more stuff for experience A etc. WAIT ... nam='SQL*Net message from client' ela= 4260917 ... stuff for Experience B WAIT ... nam='SQL*Net message from client' ela= 2928 more stuff for Experience B etc. WAIT ... nam='SQL*Net message from client' ela= 5213365 ... stuff for Experience C WAIT ... nam='SQL*Net message from client' ela= 855 more stuff for Experience C etc. WAIT ... nam='SQL*Net message from client' ela= 2044420 ... 30 WAIT ... nam='SQL*Net message from client' ela= 1202689 ... stuff for Experience A WAIT ... nam='SQL*Net message from client' ela= 342 more stuff for Experience A WAIT ... nam='SQL*Net message from client' ela= 1492 yet more stuff for experience A etc. WAIT ... nam='SQL*Net message from client' ela= 4260917 ... stuff for Experience B WAIT ... nam='SQL*Net message from client' ela= 2928 more stuff for Experience B etc. WAIT ... nam='SQL*Net message from client' ela= 5213365 ... stuff for Experience C WAIT ... nam='SQL*Net message from client' ela= 855 more stuff for Experience C etc. WAIT ... nam='SQL*Net message from client' ela= 2044420 ...
  • 31.
    @CaryMillsap It’s actually acommon pa ern. Behold the network abusing, cha y app… 31 CALL-NAME DURATION % CALLS MEAN MIN MAX --------------------------- ---------- ------ ------- -------- -------- -------- SQL*Net message from client 200.939935 99.5% 142,520 0.001410 0.000937 0.202835 SQL*Net message to client 0.526257 0.3% 142,520 0.000004 0.000000 0.000130 FETCH 0.439933 0.2% 142,518 0.000003 0.000000 0.001000 PARSE 0.000000 0.0% 2 0.000000 0.000000 0.000000 EXEC 0.000000 0.0% 2 0.000000 0.000000 0.000000 --------------------------- ---------- ------ ------- -------- -------- -------- TOTAL (5) 201.906125 100.0% 427,562 0.000472 0.000000 0.202835
  • 32.
    @CaryMillsap CALL-NAME DURATION %CALLS MEAN MIN MAX --------------------------- ---------- ------ ------- -------- -------- -------- SQL*Net message from client 0.911041 36.5% 72 0.012653 0.000890 0.026857 SQL*Net more data to client 0.841897 33.7% 2,688 0.000313 0.000004 0.013287 FETCH 0.744885 29.8% 70 0.010641 0.006999 0.012998 PARSE 0.001000 0.0% 2 0.000500 0.000000 0.001000 SQL*Net message to client 0.000147 0.0% 72 0.000002 0.000001 0.000006 EXEC 0.000000 0.0% 2 0.000000 0.000000 0.000000 --------------------------- ---------- ------ ------- -------- -------- -------- TOTAL (6) 2.498970 100.0% 2,906 0.000860 0.000000 0.026857 It’s actually a common pa ern. …and the way it should behave. 32
  • 33.
  • 34.
    @CaryMillsap 34 Such tracefiles have islands of activity in an ocean of idleness.
  • 35.
  • 36.
    @CaryMillsap 36 An islandcan have rivers. @CaryMillsap
  • 37.
    @CaryMillsap So can atrace file. WAIT ... nam='SQL*Net message from client' ela= 1202689 ... stuff for Experience A WAIT ... nam='SQL*Net message from client' ela= 342 more stuff for Experience A WAIT ... nam='SQL*Net message from client' ela= 1492 yet more stuff for experience A etc. WAIT ... nam='SQL*Net message from client' ela= 4260917 ... stuff for Experience B WAIT ... nam='SQL*Net message from client' ela= 2928 more stuff for Experience B etc. WAIT ... nam='SQL*Net message from client' ela= 5213365 ... stuff for Experience C WAIT ... nam='SQL*Net message from client' ela= 855 more stuff for Experience C etc. WAIT ... nam='SQL*Net message from client' ela= 2044420 ... 37
  • 38.
    @CaryMillsap 38 5 Howto cope with the problem
  • 39.
  • 40.
    @CaryMillsap Trace file withoceans. Find the 2.3-sec experience. 40 CALL-NAME DURATION % CALLS MEAN MIN MAX --------------------------- --------- ------ ------ -------- -------- --------- SQL*Net message from client 31.018640 99.3% 10,003 0.003101 0.000023 20.121507 direct path read 0.110575 0.4% 10,000 0.000011 0.000004 0.020533 FETCH 0.081993 0.3% 5,001 0.000016 0.000000 0.001000 SQL*Net message to client 0.008804 0.0% 10,003 0.000001 0.000000 0.000061 PARSE 0.003999 0.0% 2 0.001999 0.000000 0.003999 EXEC 0.001000 0.0% 2 0.000500 0.000000 0.001000 CLOSE 0.000000 0.0% 2 0.000000 0.000000 0.000000 --------------------------- --------- ------ ------ -------- -------- --------- TOTAL (7) 31.225011 100.0% 35,013 0.000892 0.000000 20.121507 What percentage of this 2.3-sec experience is rivers?
  • 41.
    @CaryMillsap 41 CALL-NAME DURATION% CALLS MEAN MIN MAX --------------------------- --------- ------ ------ -------- -------- --------- SQL*Net message from client 31.018640 99.3% 10,003 0.003101 0.000023 20.121507 direct path read 0.110575 0.4% 10,000 0.000011 0.000004 0.020533 FETCH 0.081993 0.3% 5,001 0.000016 0.000000 0.001000 SQL*Net message to client 0.008804 0.0% 10,003 0.000001 0.000000 0.000061 PARSE 0.003999 0.0% 2 0.001999 0.000000 0.003999 EXEC 0.001000 0.0% 2 0.000500 0.000000 0.001000 CLOSE 0.000000 0.0% 2 0.000000 0.000000 0.000000 --------------------------- --------- ------ ------ -------- -------- --------- TOTAL (7) 31.225011 100.0% 35,013 0.000892 0.000000 20.121507 CALL-NAME DURATION % CALLS MEAN MIN MAX --------------------------- --------- ------ ------ -------- -------- --------- direct path read 0.110575 53.6% 10,000 0.000011 0.000004 0.020533 FETCH 0.081993 39.7% 5,001 0.000016 0.000000 0.001000 SQL*Net message to client 0.008804 4.3% 10,003 0.000001 0.000000 0.000061 PARSE 0.003999 1.9% 2 0.001999 0.000000 0.003999 EXEC 0.001000 0.5% 2 0.000500 0.000000 0.001000 CLOSE 0.000000 0.0% 2 0.000000 0.000000 0.000000 --------------------------- --------- ------ ------ -------- -------- --------- TOTAL (6) 0.206371 100.0% 25,010 0.000008 0.000000 0.020533 Trace file with no water at all. Doesn’t explain the 2.3-sec experience. What percentage of this 2.3-sec experience is rivers? Trace file with oceans. Find the 2.3-sec experience.
  • 42.
    @CaryMillsap 42 CALL-NAME DURATION% CALLS MEAN MIN MAX --------------------------- --------- ------ ------ -------- -------- --------- SQL*Net message from client 31.018640 99.3% 10,003 0.003101 0.000023 20.121507 direct path read 0.110575 0.4% 10,000 0.000011 0.000004 0.020533 FETCH 0.081993 0.3% 5,001 0.000016 0.000000 0.001000 SQL*Net message to client 0.008804 0.0% 10,003 0.000001 0.000000 0.000061 PARSE 0.003999 0.0% 2 0.001999 0.000000 0.003999 EXEC 0.001000 0.0% 2 0.000500 0.000000 0.001000 CLOSE 0.000000 0.0% 2 0.000000 0.000000 0.000000 --------------------------- --------- ------ ------ -------- -------- --------- TOTAL (7) 31.225011 100.0% 35,013 0.000892 0.000000 20.121507 CALL-NAME DURATION % CALLS MEAN MIN MAX --------------------------- --------- ------ ------ -------- -------- --------- direct path read 0.110575 53.6% 10,000 0.000011 0.000004 0.020533 FETCH 0.081993 39.7% 5,001 0.000016 0.000000 0.001000 SQL*Net message to client 0.008804 4.3% 10,003 0.000001 0.000000 0.000061 PARSE 0.003999 1.9% 2 0.001999 0.000000 0.003999 EXEC 0.001000 0.5% 2 0.000500 0.000000 0.001000 CLOSE 0.000000 0.0% 2 0.000000 0.000000 0.000000 --------------------------- --------- ------ ------ -------- -------- --------- TOTAL (6) 0.206371 100.0% 25,010 0.000008 0.000000 0.020533 CALL-NAME DURATION % CALLS MEAN MIN MAX --------------------------- --------- ------ ------ -------- -------- --------- SQL*Net message from client 2.072877 90.9% 10,001 0.000207 0.000023 0.016861 direct path read 0.110575 4.9% 10,000 0.000011 0.000004 0.020533 FETCH 0.081993 3.6% 5,001 0.000016 0.000000 0.001000 SQL*Net message to client 0.008804 0.4% 10,003 0.000001 0.000000 0.000061 PARSE 0.003999 0.2% 2 0.001999 0.000000 0.003999 EXEC 0.001000 0.0% 2 0.000500 0.000000 0.001000 CLOSE 0.000000 0.0% 2 0.000000 0.000000 0.000000 --------------------------- --------- ------ ------ -------- -------- --------- TOTAL (7) 2.279248 100.0% 35,011 0.000065 0.000000 0.020533 Trace file with no water at all. Doesn’t explain the 2.3-sec experience. Trace file with rivers, but no oceans. Explains the 2.3-sec experience exactly. 90.9% is rivers. Easy. Trace file with oceans. Find the 2.3-sec experience.
  • 43.
    @CaryMillsap To Oracle, it’sall just water. It sees no difference between salt water and fresh water, between response-time SNMFC and non-response-time SNMFC. It’s all just SQL*Net message from client. 43 WAIT ... nam='SQL*Net message from client' ela= 1202689 ... stuff for Experience A WAIT ... nam='SQL*Net message from client' ela= 342 more stuff for Experience A WAIT ... nam='SQL*Net message from client' ela= 1492 yet more stuff for experience A etc. WAIT ... nam='SQL*Net message from client' ela= 4260917 ... stuff for Experience B WAIT ... nam='SQL*Net message from client' ela= 2928 more stuff for Experience B etc. WAIT ... nam='SQL*Net message from client' ela= 5213365 ... stuff for Experience C WAIT ... nam='SQL*Net message from client' ela= 855 more stuff for Experience C etc. WAIT ... nam='SQL*Net message from client' ela= 2044420 ...
  • 44.
    @CaryMillsap However, there are bigSQL*Net message from client calls …and li le SQL*Net message from client calls. This is a clue. 44
  • 45.
    @CaryMillsap ?SQL*Net message fromclient if ela ≥ 1.00 sec then ocean (not response time) otherwise river (response time) 45
  • 46.
  • 47.
  • 48.
    @CaryMillsap SQL*Net message fromclient if ela ≥ :b then ocean (not response time) otherwise river (response time) Sometimes you have to fine-tune the boundary value. 48
  • 49.
    @CaryMillsap 49 $ mrskew--rc=txnz.05 v11203_ora_26827.trc EXP-ID   DURATION       %   CALLS      MEAN       MIN       MAX -----------  ---------  ------  ------  --------  --------  -------- 0  24.236626   27.5%     327  0.074118  0.050007  0.283979 19547   2.212251    2.5%     807  0.002741  0.000000  0.049582 27247   2.112561    2.4%     791  0.002671  0.000000  0.048360 24221   1.927336    2.2%     267  0.007218  0.000000  0.048210 16129   1.450686    1.6%     683  0.002124  0.000000  0.049147 22289   0.997744    1.1%     643  0.001552  0.000000  0.045547 29620   0.982700    1.1%     562  0.001749  0.000000  0.049281 2843   0.967385    1.1%     655  0.001477  0.000000  0.048986 33239   0.920264    1.0%     139  0.006621  0.000000  0.047733 23031   0.917492    1.0%     647  0.001418  0.000000  0.049615 17091   0.899165    1.0%     579  0.001553  0.000000  0.045020 14701   0.864747    1.0%     123  0.007030  0.000000  0.049502 6509   0.805075    0.9%     437  0.001842  0.000000  0.043662 653   0.780152    0.9%     403  0.001936  0.000000  0.048553 36583   0.773713    0.9%     484  0.001599  0.000000  0.030175 26287   0.767064    0.9%     619  0.001239  0.000000  0.038591 2333   0.750920    0.9%     103  0.007290  0.000000  0.045808 9685   0.720571    0.8%     479  0.001504  0.000000  0.047614 25107   0.718329    0.8%     115  0.006246  0.000000  0.043572 28487   0.715467    0.8%     107  0.006687  0.000000  0.048749 309 others  43.717756   49.5%   8,389  0.005211  0.000000  0.049996 -----------  ---------  ------  ------  --------  --------  -------- TOTAL (329)  88.238004  100.0%  17,359  0.005083  0.000000  0.283979
  • 50.
    @CaryMillsap 50 6 Howto fix the problem
  • 51.
    @CaryMillsap For connection poolingapps, the oceans-islands-rivers thing works pre y well. 51 But it’s not 100% reliable. For example, what if you have a river that’s bigger than one of your oceans?
  • 52.
    @CaryMillsap If you knowyour app, you know where the experience boundaries are. 52
  • 53.
    @CaryMillsap If you caninstrument your app, it will automatically tell you where the experience boundaries are. 53
  • 54.
    @CaryMillsap If you’re runningcode in an interactive development environment, it’s easy: 54 1. activate trace; 2. execute the code path for the experience; 3. deactivate trace;
  • 55.
    @CaryMillsap If you’re runningcode in an interactive development environment, it’s easy: 55 1. activate trace; 1.1. There must be NO LATENCY here. 2. execute the code path for the experience; 2.1. There must be NO LATENCY here. 3. deactivate trace;
  • 56.
  • 57.
    @CaryMillsap This is thebest thing you can do: Instrument your application so that the trace data explains exactly one user response time experience. 57
  • 58.
    @CaryMillsap You can fixa trace file that accounts for more time than you want. 58 …E.g., if you’re stuck activating trace with dbms_monitor.session_trace_enable(:sid,:serial,true,true).
  • 59.
    @CaryMillsap But fixing atrace file requires either a) effo b) tools 59 I’ll show you both.
  • 60.
    @CaryMillsap Two types oftrace file scoping problems: 1. Unwanted calls at the bo om 2. Unwanted calls at the top or middle 60
  • 61.
    @CaryMillsap You can cutthe bo om off a trace file. …With, say, vi. No problem. 61 1
  • 62.
    @CaryMillsap 62 ... WAIT #0:nam='direct path read' ela= 7 file number=4 first dba=4665 block cnt=1 obj#=86815 tim=1313696204681916 WAIT #0: nam='direct path read' ela= 5 file number=4 first dba=4665 block cnt=1 obj#=86815 tim=1313696204681942 WAIT #0: nam='SQL*Net message to client' ela= 1 driver id=1650815232 #bytes=1 p3=0 obj#=86815 tim=1313696204681955 WAIT #0: nam='SQL*Net message from client' ela= 141 driver id=1650815232 #bytes=1 p3=0 obj#=86815 tim=1313696204682115 FETCH #5:c=0,e=5,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=1601196873,tim=1313696204682136 STAT #5 id=1 cnt=5000 pid=0 pos=1 obj=86814 op='TABLE ACCESS FULL T (cr=5003 pr=0 pw=0 time=23585 us cost=11 size=10075000 card=5000)' WAIT #5: nam='SQL*Net message to client' ela= 1 driver id=1650815232 #bytes=1 p3=0 obj#=86815 tim=1313696204682254 *** 2011-08-18 14:36:53.506 WAIT #5: nam='SQL*Net message from client' ela= 8824256 driver id=1650815232 #bytes=1 p3=0 obj#=86815 tim=1313696213506522 CLOSE #5:c=0,e=22,dep=0,type=0,tim=1313696213506643 ===================== PARSING IN CURSOR #2 len=55 dep=0 uid=84 oct=42 lid=84 tim=1313696213506753 hv=2217940283 ad='0' sqlid='06nvwn223659v' alter session set events '10046 trace name context off' END OF STMT PARSE #2:c=0,e=70,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=0,plh=0,tim=1313696213506752 EXEC #2:c=1000,e=354,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=0,plh=0,tim=1313696213507146 ␄
  • 63.
    @CaryMillsap 63 ... WAIT #0:nam='direct path read' ela= 7 file number=4 first dba=4665 block cnt=1 obj#=86815 tim=1313696204681916 WAIT #0: nam='direct path read' ela= 5 file number=4 first dba=4665 block cnt=1 obj#=86815 tim=1313696204681942 WAIT #0: nam='SQL*Net message to client' ela= 1 driver id=1650815232 #bytes=1 p3=0 obj#=86815 tim=1313696204681955 WAIT #0: nam='SQL*Net message from client' ela= 141 driver id=1650815232 #bytes=1 p3=0 obj#=86815 tim=1313696204682115 FETCH #5:c=0,e=5,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=1601196873,tim=1313696204682136 STAT #5 id=1 cnt=5000 pid=0 pos=1 obj=86814 op='TABLE ACCESS FULL T (cr=5003 pr=0 pw=0 time=23585 us cost=11 size=10075000 card=5000)' WAIT #5: nam='SQL*Net message to client' ela= 1 driver id=1650815232 #bytes=1 p3=0 obj#=86815 tim=1313696204682254 # *** 2011-08-18 14:36:53.506 # WAIT #5: nam='SQL*Net message from client' ela= 8824256 driver id=1650815232 #bytes=1 p3=0 obj#=86815 tim=1313696213506522 # CLOSE #5:c=0,e=22,dep=0,type=0,tim=1313696213506643 # ===================== # PARSING IN CURSOR #2 len=55 dep=0 uid=84 oct=42 lid=84 tim=1313696213506753 hv=2217940283 ad='0' sqlid='06nvwn223659v' # alter session set events '10046 trace name context off' # END OF STMT # PARSE #2:c=0,e=70,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=0,plh=0,tim=1313696213506752 # EXEC #2:c=1000,e=354,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=0,plh=0,tim=1313696213507146 ␄
  • 64.
    @CaryMillsap But cu ingcalls out of the bo om of a trace file is almost never going to be enough. 64
  • 65.
    @CaryMillsap Cu ing callsout of the middle or the top requires magic. 65 2
  • 66.
    @CaryMillsap 66 *** 2011-08-1814:36:21.576 *** SESSION ID:(23.42) 2011-08-18 14:36:21.576 *** CLIENT ID:() 2011-08-18 14:36:21.576 *** SERVICE NAME:(SYS$USERS) 2011-08-18 14:36:21.576 *** MODULE NAME:(SQL*Plus) 2011-08-18 14:36:21.576 *** ACTION NAME:() 2011-08-18 14:36:21.576 WAIT #8: nam='SQL*Net message to client' ela= 1 driver id=1650815232 #bytes=1 p3=0 obj#=-1 tim=1313696181576631 *** 2011-08-18 14:36:41.698 WAIT #8: nam='SQL*Net message from client' ela= 20121507 driver id=1650815232 #bytes=1 p3=0 obj#=-1 tim=1313696201698518 CLOSE #8:c=0,e=41,dep=0,type=1,tim=1313696201698681 ===================== PARSING IN CURSOR #7 len=352 dep=1 uid=84 oct=3 lid=84 tim=1313696201699956 hv=2904344320 ad='3e4f6d48' sqlid='f70vdzaqjtjs0' SELECT /* OPT_DYN_SAMP */ /*+ ALL_ROWS IGNORE_WHERE_CLAUSE NO_PARALLEL(SAMPLESUB) opt_param('parallel_execution_enabled', 'false') NO_PARALLEL_INDEX(SAMPLESUB) NO_SQL_TUNE */ NVL(SUM(C1),:"SYS_B_0"), NVL(SUM(C2),:"SYS_B_1") FROM (SELECT / *+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ :"SYS_B_2" AS C1, :"SYS_B_3" AS C2 FROM "T" "T") SAMPLESUB END OF STMT PARSE #7:c=0,e=402,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=1,plh=0,tim=1313696201699946 ...
  • 67.
    @CaryMillsap 67 *** 2011-08-1814:36:21.576 *** SESSION ID:(23.42) 2011-08-18 14:36:21.576 *** CLIENT ID:() 2011-08-18 14:36:21.576 *** SERVICE NAME:(SYS$USERS) 2011-08-18 14:36:21.576 *** MODULE NAME:(SQL*Plus) 2011-08-18 14:36:21.576 *** ACTION NAME:() 2011-08-18 14:36:21.576 WAIT #8: nam='SQL*Net message to client' ela= 1 driver id=1650815232 #bytes=1 p3=0 obj#=-1 tim=1313696181576631 *** 2011-08-18 14:36:41.698 WAIT #8: nam='SQL*Net message from client' ela= 20121507 driver id=1650815232 #bytes=1 p3=0 obj#=-1 tim=1313696201698518 CLOSE #8:c=0,e=41,dep=0,type=1,tim=1313696201698681 ===================== PARSING IN CURSOR #7 len=352 dep=1 uid=84 oct=3 lid=84 tim=1313696201699956 hv=2904344320 ad='3e4f6d48' sqlid='f70vdzaqjtjs0' SELECT /* OPT_DYN_SAMP */ /*+ ALL_ROWS IGNORE_WHERE_CLAUSE NO_PARALLEL(SAMPLESUB) opt_param('parallel_execution_enabled', 'false') NO_PARALLEL_INDEX(SAMPLESUB) NO_SQL_TUNE */ NVL(SUM(C1),:"SYS_B_0"), NVL(SUM(C2),:"SYS_B_1") FROM (SELECT / *+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ :"SYS_B_2" AS C1, :"SYS_B_3" AS C2 FROM "T" "T") SAMPLESUB END OF STMT PARSE #7:c=0,e=402,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=1,plh=0,tim=1313696201699946 ... If you delete this line, then its 20.121507- second contribution to the 20.122009 seconds between calls will be unexplained. (1,313,696,201.698681 – 41) – 1,313,696,181.576631 = 20.122009
  • 68.
    @CaryMillsap 68 *** 2011-08-1814:36:21.576 *** SESSION ID:(23.42) 2011-08-18 14:36:21.576 *** CLIENT ID:() 2011-08-18 14:36:21.576 *** SERVICE NAME:(SYS$USERS) 2011-08-18 14:36:21.576 *** MODULE NAME:(SQL*Plus) 2011-08-18 14:36:21.576 *** ACTION NAME:() 2011-08-18 14:36:21.576 WAIT #8: nam='SQL*Net message to client' ela= 1 driver id=1650815232 #bytes=1 p3=0 obj#=-1 tim=1313696181576631 *** 2011-08-18 14:36:41.698 WAIT #8: nam='SQL*Net message from client' ela= 20121507 driver id=1650815232 #bytes=1 p3=0 obj#=-1 tim=1313696201698518 CLOSE #8:c=0,e=41,dep=0,type=1,tim=1313696201698681 ===================== PARSING IN CURSOR #7 len=352 dep=1 uid=84 oct=3 lid=84 tim=1313696201699956 hv=2904344320 ad='3e4f6d48' sqlid='f70vdzaqjtjs0' SELECT /* OPT_DYN_SAMP */ /*+ ALL_ROWS IGNORE_WHERE_CLAUSE NO_PARALLEL(SAMPLESUB) opt_param('parallel_execution_enabled', 'false') NO_PARALLEL_INDEX(SAMPLESUB) NO_SQL_TUNE */ NVL(SUM(C1),:"SYS_B_0"), NVL(SUM(C2),:"SYS_B_1") FROM (SELECT / *+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ :"SYS_B_2" AS C1, :"SYS_B_3" AS C2 FROM "T" "T") SAMPLESUB END OF STMT PARSE #7:c=0,e=402,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=1,plh=0,tim=1313696201699946 ... You can’t just delete this line (or set its ela value to 0). You must also subtract 20.121507 seconds from every *** line and tim value from there to the end of the file.
  • 69.
    @CaryMillsap Cu ing thetop is just like cu ing the middle, because of the *** lines. 69
  • 70.
    @CaryMillsap 70 Oracle Database11g Enterprise Edition Release 11.2.0.1.0 - Production With the Partitioning, OLAP, Data Mining and Real Application Testing options ORACLE_HOME = /app/oracle/product/11.2.0/db_1 System name: Linux Node name:local-orcl Release: 2.6.18-194.el5 Version: #1 SMP Mon Mar 29 20:06:41 EDT 2010 Machine: i686 Instance name: yyz Redo thread mounted by this instance: 1 Oracle process number: 25 Unix process pid: 10358, image: oracle@local-orcl (TNS V1-V3) *** 2011-08-18 14:36:21.576 *** SESSION ID:(23.42) 2011-08-18 14:36:21.576 *** CLIENT ID:() 2011-08-18 14:36:21.576 *** SERVICE NAME:(SYS$USERS) 2011-08-18 14:36:21.576 *** MODULE NAME:(SQL*Plus) 2011-08-18 14:36:21.576 *** ACTION NAME:() 2011-08-18 14:36:21.576 WAIT #8: nam='SQL*Net message to client' ela= 1 driver id=1650815232 #bytes=1 p3=0 obj#=-1 tim=1313696181576631 *** 2011-08-18 14:36:41.698 WAIT #8: nam='SQL*Net message from client' ela= 20121507 driver id=1650815232 #bytes=1 p3=0 obj#=-1 tim=1313696201698518 CLOSE #8:c=0,e=41,dep=0,type=1,tim=1313696201698681
  • 71.
    @CaryMillsap $ mrcallrm --lines=26,35078yyz_ora_10358.trc > yyz_ora_10358-fixed.trc 71
  • 72.
  • 73.
    @CaryMillsap 73 References h p://www.slideshare.net/carymillsap/how-to-find-and-fix-your Afree online presentation about how to instrument your application so it will automatically tell you where the experience boundaries are. h p://method-r.com/blogs/company-blog/214-finding-connection-pool-response-times-with-method- r-tools “Connection pool response times with Method R Tools (Oceans, Islands, and Rivers),” a blog post explaining the oceans-islands-rivers metaphor. h ps://motdcr3.eventbrite.com “Mastering Oracle Trace Data free online class reunion,” to be held 11:00a–12:30p CST Thursday, February 10, 2015. h p://amzn.to/173bpzg “The Method R Guide to Mastering Oracle Trace Data,” a textbook for the 1- to 2-day course that covers Method R Corporation so ware and methods. h p://method-r.com/so ware/m race A Method R extension for Oracle SQL Developer. Method R Trace collects trace data and retrieves it for you, automatically. h p://method-r.com/so ware/m ools A set of so ware tools for mining and manipulating Oracle extended SQL trace data. I use mrskew to repo on durations of individual experiences recorded in extended SQL trace files. I use mrcallrm to eliminate calls from my trace data. It automatically ripples the required tim and *** line changes throughout a trace file. h p://method-r.com/courses/mastering-oracle-trace-data “Mastering Oracle Trace Data,” a 1- to 2-day course that covers Method R Corporation so ware and methods.
  • 74.
  • 75.