Whitepaper: Mining the AWR repository for Capacity Planning and Visualization
1. M
PAR
I
1
t
r
A
o
D
c
a
S
ining th
RT 1 - Whe
I’m certain th
108 AWR rep
the bottleneck
reports by han
AWR reports
of the 1000+ l
Definitely thi
consultant I m
already availa
SQL*Plus ses
e AWR
New CPU
a cost. H
available
DBAs an
guesswo
proper p
for your
savings
AWR is
steroids
workloa
samples
to visua
AWR da
and Util
metrics
In this p
to have
Analysis
re it all sta
hat DBAs or d
ports in just an
k? And what
nd, and it is a
generation, e
lines of perfo
s will lead to
must also be aw
able to help
ssion?
Reposit
o
Us and storage a
Hence, capacity p
e and to handle
nd IT managers
ork you'll end up
planning, and ma
r workload with
for the company
a built-in data s
". It has impro
d information w
s, we could build
alize data and us
ta samples is we
ization in terms
for Capacity Plan
paper you will lea
e a clear-cut me
s, and Performan
arted
developers do
n hour. How a
is the bottlen
a daunting an
especially wh
rmance data t
o longer analy
ware on how
with the trou
tory for C
other Re
K
Oracle ACE
karlara
arrays are getting
planning plays a
expected and un
is justifying the
p getting the mo
anagement of gr
allowance for a
y and a happier I
store that starte
oved significantl
when going thro
amazing reports
se statistical met
e are able to def
of CPU, IO, me
nning.
arn how to make
easurement on
nce Firefighting.
on’t have eno
about 108 AW
neck? Well be
nd repetitive e
hen you start r
to correlate it
ysis periods h
to optimize m
ubleshooting
Capacity
al World
Karl Arao
E, OCP-DBA
ao@gmail.co
g faster, but the
very important
nexpected worklo
expense of add
st expensive har
rowth, you'll be
particular grow
IT shop.
d in 10gR1 and
y in 11gR2, en
ugh all the AW
s that will let us
thods for analys
fine the databas
emory, and netw
e use of the AWR
resources to aid
ough time to s
WR reports in
efore it will ta
execution of a
reading each
t to the proble
hence longer
my troublesho
but what if
y Plann
d Stuff
A, RHCE
om
se resources are
role to ensure pr
oads. Another cri
ding resources on
rdware. With pro
able to get just
wth period. This
is very much lik
nabling you to
R snapshots. Fr
notice trends an
is. Even more su
se server's Capac
work, which are v
R, specifically the
d in Capacity P
spare to read
n 5 minutes ju
ake so much
awrrpt.sql. Y
of them and
em at hand.
r time for a p
ooting time. Y
you are only
ing, Vis
e finite and come
roper resources a
itical matter for t
n the system. W
oper measureme
the right hardwa
will result in hu
ke a "Statspack
have a far bet
rom the AWR d
d makes it possi
urprising about t
city, Requiremen
very important k
e DBA_HIST view
Planning, Predict
d 108 AWR re
ust to answer
of my time ju
You will be ov
you only nee
problem to be
You can argu
y left with ju
ualizatio
e at
are
the
With
ent,
are
uge
on
tter
ata
ble
the
nts,
key
ws,
tive
eports in a da
the question w
ust to generat
verwhelmed b
ed to see parti
e solved and
ue that there ar
ust a comma
on, and
ay, even mor
what period i
te these AWR
by the manua
icular section
as a databas
re visual tool
and line or an
e
s
R
al
s
e
s
n
2. T
p
v
PAR
A
s
O
T
a
p
3
t
F
d
s
d
T
f
T
3
A
t
This scenario
performance d
visualize the d
RT 2 - How
AWR is much
sources of the
Oracle version
The AWR rep
an AWR repo
performance p
339) within th
the workload
For the query
data blocks re
since instance
delta and tran
To transform
formula. See t
IO MB/s = ( (d
= ((5
= 73
To validate th
339. The imag
Also a run of
the throughpu
triggered me
data in more
data, or even p
w to mine th
h like “Statspa
e AWR repor
n 11.2.
port provides
ort for SNAP
problems we
he specified in
change that’s
y output we a
ead from disk
e start. We ar
nsforming it to
the delta to
the example f
delta * <block_size
5663126 * 8192) /
3.37 MB/s
he accuracy o
ge below show
Automatic D
ut of 74 MB/s
e to mine on t
meaningful m
possible to do
he AWR
ack on steroid
rt are the DB
a single summ
P_ID 335 to
are more inte
nterval. In th
s happening.
are investigati
k. It is also im
re particularly
o a more mea
a more mean
for SNAP_ID
e>) /1024/1024 ) /
1024/1024) / 603
f the derived
ws the delta w
Database Diag
that is really
the source tab
manner that w
o some statist
ds” it is a won
BA_HIST view
mary report b
339 that is a
erested to see
at way we ha
ing for the S
mportant to no
y interested o
aningful and r
ningful outpu
D 338 below:
/ <snap_duration_
value we nee
we used to der
gnostic Monit
close to our d
bles of the AW
will be easier
tics out of it.
nderful data c
ws which hav
based upon an
an interval tim
what occurre
ave a granular
YSSTAT sta
ote this is a cu
on the delta o
eadable outpu
ut that we cou
_in_seconds>
ed to compar
rive the MB/s
tor (ADDM)
derived value
WR report to
for me to no
collector for O
ave grown fro
n interval of t
me from 6:20
ed during eac
r view of wha
atistic “physic
umulative phy
of each SNAP
ut.
uld easily un
re it with the
s is correct.
on SNAP_ID
e
cut out the u
otice trends an
Oracle and OS
om 67 in Ora
time. On the i
0 – 7:01AM.
ch of the samp
at’s going on
cal reads” wh
ysical reads b
P_ID that is
nderstand we
actual AWR
D 338 – 339 s
unnecessary an
nd even poss
S statistics. T
acle version 1
image below
However wh
ple (335,336,
n and have a b
hich is the to
by all the data
end_value –
would apply
report on SN
shows that we
nd present th
ible for me to
The underlying
10.1 to 108 in
we can creat
hen analyzing
, 337, 338 and
better view on
tal number o
abase session
start_value =
y the IO MB/
NAP_ID 338 –
e are reaching
e
o
g
n
e
g
d
n
of
s
=
s
–
g
3. A
T
And checking
The data show
SELECT * FRO
( SELECT s0.sn
TO_CHAR(s0
s10t0.stat_n
s10t0.value
s10t1.value
(s10t1.value
round(((((s1
),2) as phyr
FROM dba_h
dba_hi
dba_hi
dba_hi
WHERE s0.db
AND s1.dbid
AND s10t0.db
AND s10t1.db
AND s0.instan
AND s1.instan
AND s10t0.in
AND s10t1.in
AND s1.snap_
AND s10t0.sn
AND s10t1.sn
AND s10t0.st
AND s10t1.st
)
WHERE snap_
ORDER BY sn
g it with the E
wn above com
OM
nap_id snap_id,
0.END_INTERVAL_
name,
start_value,
end_value,
e ‐ s10t0.value) de
0t1.value ‐ s10t0.v
+ EXTRACT(H
+ EXTRACT(M
+ EXTRACT(S
reads_mbps
ist_snapshot s0,
st_snapshot s1,
st_sysstat s10t0,
st_sysstat s10t1
bid = 26079
= s0.dbid
bid = s0.dbi
bid = s0.dbi
nce_number = 1
nce_number = s
stance_number =
stance_number =
_id = s0.sna
nap_id = s0.sn
nap_id = s0.sn
at_name = 'ph
at_name = s10
_id in (335,336,33
ap_id ASC;
Enterprise Man
mes from quer
_TIME,'YY/MM/DD
lta,
value)* 8192)/102
HOUR FROM s1.EN
MINUTE FROM s1
SECOND FROM s1
‐‐ physica
950532 ‐‐ DBID
d
d
1 ‐‐ INSTAN
s0.instance_numb
= s0.instance_num
= s0.instance_num
ap_id + 1
nap_id
nap_id + 1
hysical reads'
0t0.stat_name
7,338,339)
nager Perform
ry below:
D HH24:MI') TIME,
24/1024) / ((round
ND_INTERVAL_TIM
.END_INTERVAL_T
.END_INTERVAL_T
l reads, diffed
NCE_NUMBER
er
mber
mber
mance page sh
d(EXTRACT(DAY FR
ME ‐ s0.END_INTER
TIME ‐ s0.END_INT
TIME ‐ s0.END_INT
hows that the
ROM s1.END_INTE
RVAL_TIME) * 60
TERVAL_TIME)
TERVAL_TIME) / 6
e Disk IO is ar
ERVAL_TIME ‐ s0.E
60, 2))*60)
round our der
END_INTERVAL_TI
rived value
IME) * 1440
4. You may have noticed that I used the SQL trick below that has similar effect to the LAG function. This enables the
query to get the start_value and end_value on a single row making it possible to get the delta value and apply the
performance formula. The view DBA_HIST_SNAPSHOT also acts as an ultimate reference of snap information that
allows joining to the other DBA_HIST views to provide meaningful data on other subsystems or workload
performance data.
AND s10t0.snap_id = s0.snap_id
AND s10t1.snap_id = s0.snap_id + 1
The query I’ve shown you is just one part of the story, that’s only giving the “IO Read MB/s” - an IO subsystem
statistic. Ideally we must have a correlation on the following subsystems of the database server to fully characterize
the overall workload and performance:
1) Oracle
Oracle instance and database configuration
2) Operating System
CPU, memory, IO, and network
3) Application
SQLs and anything specific to the application
For the correlation we would be using the “3-circle analysis” technique [1] where each subsystem represents a circle
and is diagnosed separately and then in combination. If the problem resides with the database server, the overlap of
the 3 circles is the current performance problem. By doing this we will have a clear correlation of the workload and
performance across subsystems and will have targeted efforts to improve the overall response time.
In mining the AWR having a query in a time series layout and only the relevant statistics shown side by side can be
very useful in various ways and even if it can’t be shown side by side each bottleneck period relates to a particular
SNAP_ID so the correlation across various performance data is extremely possible!
Having this we would have the following advantages
Quickly notice trends for performance diagnosis
We have the beautiful set of workload and performance data now in our control
We have lots of data points for statistical and predictive analysis
Faster analysis ever!
5. A
a
T
T
c
Script Na
awr_genw
awr_topev
awr_servic
As I go along
applied succes
The chart belo
The table bel
created:
ame DB
wl DB
DB
DB
DB
vents DB
DB
DB
ces DB
DB
g with my re
ssfully on rea
ow shows the
low shows th
IM
BA_HIST vie
BA_HIST_SNAPS
BA_HIST_OSSTA
BA_HIST_SYS_T
BA_HIST_SYSST
BA_HIST_SNAPS
BA_HIST_SYSTE
BA_HIST_SYS_T
BA_HIST_SNAPS
BA_HIST_SERVI
esearch of mi
al world perfo
categorical r
he important
MPORTANT NO
ews
SHOT
AT
TIME_MODEL
TAT
SHOT
EM_EVENT
TIME_MODEL
SHOT
ICE_STAT
ining the AW
ormance scena
relationship o
details of th
TE: Diagnostic
Data pres
AAS
CPU capac
CPU requir
Memory re
IO require
Logged on
CPU Utiliza
Event
Event Ran
Waits
Time
Avgwt (ms
DB Time %
AAS
Wait Class
Service Na
DB Time
DB CPU
Physical Re
Logical Rea
AAS
WR I have cr
arios.
f the scripts:
he scripts and
c Pack License
sented
city
rements
equirements
ments
users
ation
k
s)
%
ame
eads
ads
reated and co
d some reaso
e is needed for
Descriptio
This is the
overview of
the relations
Utilization =
The AAS co
periods whe
just idle
This is a ve
with AAS m
Coming from
must be aw
drilling dow
of data over
Graphing th
that outputs
different wa
you could g
Service ena
or allowing
This data is
us a classif
database.
Showing thi
column will
most the wo
ollected some
on behind ho
r the scripts
on
starting point.
f the load of th
ship of the form
= Requirements
olumn serves a
ere the databa
rsion of "Top 5
etric.
m the awr_genw
ware about the c
n on the time c
r a period of tim
his data will be m
s a nice graph a
ait classes giving
o back and drill
ables the groupi
the distribution
s commonly see
fication of the
is data in a tim
give us an idea
orkload of the d
e useful scrip
ow they are f
You first run
he database se
mula
/ Capacity
as a (golden) m
ase could be h
Timed Events"
wl, for the AAS
components of A
components) an
me (across SNAP
much like the E
and slicing the A
g you a broad “
l down on the p
ng of common
of connections
en on the Enter
application/mo
me series manne
a if particular ap
database.
pts that I hav
formatted and
this SQL to ha
rver. It clearly
metric on findi
having a bottlen
but across SNA
to be more use
AAS (much like
d have this kind
P_IDs).
nterprise Manag
AAS component
“historical” view
past load activity
database conne
s (e.g. RAC).
prise Manager t
odule activity o
er and adding a
pplications are
e
d
ave an
shows
ng the
neck or
AP_IDs
ful we
d
ger
ts to
which
y.
ections
to give
on the
an AAS
driving
6. awr_sysstat DBA_HIST_SNAPSHOT
DBA_HIST_OSSTAT
DBA_HIST_SYS_TIME_MODEL
DBA_HIST_SYSSTAT
AAS
LIO/s
DB Block Changes/s
User Calls/s
Parses/s
Hard Parses/s
Sorts/s
Logon/s
SQL*NET to client MB
SQL*NET to dblink MB
This is a version of "Load Profile" but across SNAP_IDs with
AAS metric.
Useful to quickly notice the Oracle workload change. You may
put additional SYSSTAT statistic you want to monitor here.
awr_topsqlx DBA_HIST_SNAPSHOT
DBA_HIST_SQLSTAT
DBA_HIST_SQLTEXT
SQL_ID
Plan Hash Value
Module
Elapsed Time (s)
Elapsed Time / exec (s)
CPU Time (s)
IO Time (s)
App Time (s)
Concurrency Time (s)
Cluster Wait (s)
LIO
PIO
Direct Writes
Rows
Exec
Parse Count
PX Exec
Time Rank
AAS
SQL_TEXT
The “SQL section” of the AWR report is usually segregated into
sections ordered by the following:
Elapsed Time
CPU Time
Gets
Reads
Executions
Parse Calls
Having separate data for a particular problematic SQL_ID
spread over 1000+ lines of report makes it hard to find every
detail about its performance.
I feel there’s a better way to present the data. And here are
the info/sections you'll get from the script and some short
description:
1) snap_id, time, instance, snap duration
The time period and snap_id could be used to show the SQLs
for a given
workload period..let's say you usual work hours is 9-6pm, you
could just
show the particular SQLs on that period.. there's a data range
section on
the bottom of the script you could make use of it if you want to
filter.
2) sql_id, plan_hash_value, module
You could make use of this info if you want to know where the
SQL was
executed (SQL*Plus, OWB, Toad, etc.).. plus you could
compare the
plan_hash_value but I suggest you make use of Kerry
Osborne's
awr_unstable_plans.sql script if you'd like to search for
unstable plans.
3) total elapsed time, elapsed time per exec
- cpu time
- io time
- app wait time
- concurrency wait time
- cluster wait time
These are the time info.. at least without tracing the SQL you'd
know what
time component is consuming the elapsed time of that
particular SQL.. so
let's say your total elapsed time is 1000sec, and cpu time of
30sec, and io
time of 300sec... you would know that it is consuming
significant IO but you
have to look for the other 670sec which could be attributed by
"other" wait
events (like PX Deq Credit: send blkd,etc,etc)
4) - LIOs
- PIOs
- direct writes
- rows
- executions
- parse count
- PX
Some other statistics about the SQL.. if your incurring a lot of
PIOs, how
many times this SQL was executed on that period, the # of PX
spawed.. just
be careful about these numbers if you have "executions" of
7. let's say 8.. you
have to divide these values to 8 as well as on the time
section..
only the "elapsed time per exec" is the per execution value..
this is for formatting reasons because I can't fit them all on my
screen..
5) - AAS (Average Active Sessions)
- Time Rank
- SQL type, SQL text
This is one of my favorites... this will measure how's the SQL is
performing against the database server.. I'm using the AAS &
CPU count as my
yardstick for a possible performance problem (I suggest
reading Kyle's stuff
about this):
if AAS < 1
-- Database is not blocked
AAS ~= 0
-- Database basically idle
-- Problems are in the APP not DB
AAS < # of CPUs
-- CPU available
-- Database is probably not blocked
-- Are any single sessions 100% active?
AAS > # of CPUs
-- Could have performance problems
AAS >> # of CPUS
-- There is a bottleneck
so having the AAS as another metric on the TOP SQL is good
stuff.. I've also
added the "time rank" column to know what is the SQLs
ranking on the top
SQL.. normally the default settings of the script will show time
rank 1 to 5.. this could be useful also if you are finding a
particular SQL that is on
rank #15 and you are seeing that there's an adhoc query that
is time rank #1
and #2 affecting the database performance..
And.... this script could also show SQLs that span across
SNAP_IDs... I
would order the output by SNAP_ID and filter on that particular
SQL then you
would see that if the SQL is still running and span across let's
say 2
SNAP_IDs then the exec count would be 0 (zero) and elapsed
time per exec is
0 (zero).. only the time when the query is finished you'll see
these values
populated.. I've noticed this behavior and it's the same thing
that is shown
on the AWR reports.. you could go here for that scenario
http://karlarao.tiddlyspot.com/#%5B%5BTopSQL%20on%20A
WR%5D%5D
awr_topsql DBA_HIST_SNAPSHOT
DBA_HIST_SQLSTAT
DBA_HIST_SQLTEXT
SQL_ID
Plan Hash Value
Module
Elapsed Time (s)
Elapsed Time / exec (s)
CPU Time (s)
Cluster Wait (s)
LIO
PIO
Rows
Exec
Parse Count
PX Exec
Time Rank
AAS
Similar columns from awr_topsqlx but this time just showing
the top 20 SQLs across SNAP_IDs.
awr_unstable_plans
(by Kerry Osborne)
DBA_HIST_SNAPSHOT
DBA_HIST_SQLSTAT
SQL_ID
Executions
Min,Max,Avg Etime
Avg LIO
STD_DEV
This script finds SQL statements with plan instability. I like the
clever use of standard deviation to show SQLs with variable
elapsed time.
8. awr_parm_mods
(by Kerry Osborne)
DBA_HIST_SNAPSHOT
DBA_HIST_PARAMETER
V$INSTANCE
Parameter Name
Old Value
New Value
This script shows all parameters (including hidden) that have
been modified.
awr_netwl DBA_HIST_SYSMETRIC_SUMMARY Network Minvalue (MB)/s
Network Maxvalue (MB)/s
Network Avgvalue (MB)/s
Network STD_DEV (MB)/s
The data comes from the metric family of tables that shows
“Network Traffic Volume Per Sec”
Keep in mind that metrics are different from sysstat values. On
sysstat you just get the delta and the rate, in metric the
sampling is different let's say the snap duration is 10mins what
metric does is it samples on per 60sec interval (num_interval)
and get the max, min, avg, std_dev of those samples.
awr_est_gc_traffic
(by
John Kanagaraj)
DBA_HIST_SNAPSHOT
DBA_HIST_SYSSTAT
DBA_HIST_DLM_MISC
V$DATABASE
V$PARAMETER
Estimated Interconnect
Traffic (KB)
This script is ideal for RAC environment and shows the
interconnect throughput of an instance. Very useful if you want
to check if the interconnect is being saturated.
awr_iowl DBA_HIST_SNAPSHOT
DBA_HIST_OSSTAT
DBA_HIST_SYS_TIME_MODEL
DBA_HIST_SYSSTAT
AAS
CPU IO WAIT Utilization
OS Load
Single Block R/W IOPS
Multi Block R/W IOPS
R/W MB/s
Total R/W IOPS
R/W Ratio
HW Disk IOPS
HW # of Disks
This script is ideal for monitoring the Oracle IO activity. Very
useful for sizing and consolidating storage for Oracle
databases. This can be used together with a storage
monitoring tool to have a complete picture of IO performance.
The last two columns have the corresponding formula that is
used by storage engineers to determine the number of disk
needed by the database.
HW Disk IOPS = (IOPS * Read Ratio) + (IOPS * Write Ratio *
RAID penalty)
HW # of Disks = Total disk IOPS / IOPS per disk
Of course the “HW # of Disks” is not the final number. There
are other factors (bandwidth, throughput, service time, etc.)
that need to be considered to determine the right storage for a
particular IO workload but this can be your starting point. Also
benchmarking will help a lot on the storage decisions.
awr_io_ts DBA_HIST_SNAPSHOT
DBA_HIST_FILESTATXS
DBA_HIST_TEMPSTATXS
Tablespace R/W IOPS
Tablespace R/W latency
This script shows the IO performance of the tablespaces. This
is the same as what you see in AWR but across SNAP_IDs.
The latency formula is as follows
latency (ms) = (readtim / phy reads) * 10
Keep in mind that on this script the IOPS and latency values
are aggregated from all the datafiles of the tablespace. So
diagnosing latency issues using this script may not represent
the actual numbers but may warn you from the textual trends
of high latency (ms) numbers that way you’ll be interested on
particular workload periods to probe it with small duration
samples.
awr_io_file DBA_HIST_SNAPSHOT
DBA_HIST_FILESTATXS
DBA_HIST_TEMPSTATXS
Datafile R/W IOPS
Datafile R/W latency
This script shows the IO performance of the datafiles. This is
the same as what you see in AWR but across SNAP_IDs.
Keep in mind that the IOPS and latency values may be
normalized if the snap interval is too long (60mins above)
compared to per 5seconds or 10 minute snap interval. (see
Appendix)
r2toolkit [2] DBA_HIST_SNAPSHOT
DBA_HIST_DATABASE_INSTANCE
DBA_HIST_SYSSTAT
DBA_HIST_SYSTEM_EVENT
DBA_HIST_SYS_TIME_MODEL
DBA_HIST_OSSTAT
DBA_HIST_WR_CONTROL
Y and X values that can
be plotted for Linear
Regression
This is a performance toolkit that uses AWR data and Linear
Regression to identify what metric/statistic is driving the
database server’s workload. The data points can be very useful
for capacity planning giving you informed decisions and
completely avoiding guesswork!
You can also do the same kind of mining with Statspack. Each DBA_HIST view has a counterpart Statspack view and
you can achieve similar results
DBA_HIST_SNAPSHOT = STATS$SNAPSHOT
DBA_HIST_OSSTAT = STATS$OSSTAT
9. DBA_HIST_SYS_TIME_MODEL = STATS$SYS_TIME_MODEL
DBA_HIST_SYSSTAT = STATS$SYSSTAT
The scripts mentioned are freely downloadable and more details on the math and performance formulas (rates, time,
IOPS, CPU, latency, utilization, AAS) will be discovered when you look into the SQL code. I would also suggest that
if you are serious on mining the AWR you must take time to play further with the DBA_HIST tables and the
underlying data and you’ll appreciate that you have a better understanding on how the data are derived on the plain
AWR report.
PART 3 - Visualization
Average Active Sessions (AAS) has become my default (golden) metric on finding the periods where the database
could be having a bottleneck or just idle. Essentially AAS is the database load; this value should not go above the
CPU count (NUM_CPUS in DBA_HIST_OSSTAT) and if it does then that means the database is working very hard
or waiting a lot for something.
Together, the AAS & CPU count is used as a yardstick for a possible performance problem [3]
If AAS < 1
‐‐ Database is not blocked
AAS ~= 0
‐‐ Database basically idle
‐‐ Problems are in the APP not DB
AAS < # of CPUs
‐‐ CPU available
‐‐ Database is probably not blocked
‐‐ Are any single sessions 100% active?
AAS > # of CPUs
‐‐ Could have performance problems
AAS >> # of CPUS
‐‐ There is a bottleneck
Just like a doctor, AAS could be your “stethoscope” when investigating performance problems but it doesn’t stop
there. For it to be more useful you must be aware about the components of AAS much like drilling down on the
time components and have this kind of data over a period of time (across SNAP_IDs). Well Enterprise Manager
does this nice graphs on the “Performance and Top Activity page” and slicing the AAS components into different
“Wait Classes” and it’s got a “Historical” view which you could go back and drill down on the past load activity.
But what could be the problem?
10. S
I know so
long AWR
because th
some othe
So what could
1) U
SN
2) O
To be co
SNAP_ID
The imag
there’s a
componen
ome of you h
R retention p
here was an i
er issue where
d be the alter
Use the Top T
NAP_IDs
Or use the scri
onsistent with
D 335-339. No
ge below is a
big spike on
nts.
have encounte
period (365 da
instance shutd
e Enterprise M
rnative?
imed Events
pt together w
h the initial e
ote that the A
stacked area
n the database
ered this Ent
ays to exagge
down betwee
Manager reall
SQL (awr_to
with Perfsheet
example we
AAS during th
awr_
chart of the
e load… but
erprise Mana
erate it) but E
n the date yo
ly can’t just g
opevents.sql)
! … a great to
will focus on
his period had
_genwl.sql ou
awr_topevent
we want to
ager error at
Enterprise Ma
ou want to go
give you the v
and focus on
ool for ad-hoc
n the same i
d a sudden spi
utput
nts.sql using P
know more
some point. Y
anager won’t
o and the date
visualization y
the AAS and
c performance
interval time
ike that is on t
Perfsheet. It’s
about it by d
You are conf
let you go ba
e you are now
you need.
d wait class co
e visualizatio
6:20 to 7:0
the range of 2
s clear from t
drilling down
figured with
ack farther al
w. Or could b
olumns acros
on [4]
1 AM that i
2.2 to 3.5
the image tha
n on the AAS
a
ll
e
s
s
at
S
11. S
Looking a
know wh
activity, it
Some more ba
On the E
into differ
From the
ways to d
1) T
2) S
AAS on t
model. T
DBA_HIS
at the “textua
hich AAS com
t’s evident tha
ackground
nterprise Ma
rent wait clas
2nd slide of K
erive the valu
Time Statistics
ampling
the Performa
This is also
ST_SYSTEM
al trends” of a
mponent is d
at there’s a hi
anager “Perfor
ses. But, did y
Kyle Hailey’s
ue:
s
ance Page use
what the sc
M_EVENT a
Stacke
awr_topevent
driving the w
igh User IO a
awr_to
rmance” and
you know tha
s presentation
es “Time Sta
cript awr_top
and the “CPU
ed area chart o
ts.sql output j
workload of th
activity.
opevents.sql o
“Top Activit
at their data so
n [3] on AAS
atistics” and i
pevents.sql is
U” from tim
of AAS
just by lookin
he database.
output
ty” Page you
ources are dif
(Average Ac
is actually fr
s doing… it
me model vie
ng at the AA
For the part
’ll see the AA
fferent?
ctive Sessions
rom v$system
t unions the
ew DBA_HIS
S column we
ticular SNAP
AS compone
s) it says that
m_event + CP
e output of
ST_SYS_TIM
e would easily
P_IDs of high
ents are sliced
there are two
PU from tim
“events” on
ME_MODEL
y
h
d
e
n
L
12. S
N
and then f
it look sim
AAS valu
“CPU use
AAS on th
on a 15
refresh to
CPU from
So what’s the
On a high
to Perform
session (th
think) tha
Time Stat
If you wa
History of
Now time for
Finding th
we can cr
filter only the
milar to the E
ues will be cou
ed by this sess
he Top Activ
5sec refresh
o Historical t
m time model)
e effect?
h CPU activity
mance Page
he only way t
an v$sysstat “
tistics (one of
ant more info
f Session Loa
Perfsheet a l
he AAS comp
eate the same
e top 5 and do
Enterprise Ma
unted. By the
sion”.
vity Page uses
rate… but
then it also st
).
y period you
. Simply bec
to see CPU u
“CPU used by
f two ways to
o about the d
ad [5] and AA
la Enterprise
ponent that’s
e visualization
Stacke
o this across th
anager Perfor
e way, on 10g
s “Sampling”
as I have
tarts to behav
’ll notice that
cause ASH s
usage real tim
y this session
calculate AA
etails around
AS investigati
Manager!
driving the w
n like the Ent
ed area chart
he SNAP_ID
rmance Page
g below the lo
and by defau
observed wh
ve like the Pe
t there will be
samples every
e) while the t
n” there could
AS) which cou
d the Perform
on [14]
workload is a
erprise Mana
t AAS compo
Ds but for grap
I have to inc
oad chart is co
ult is taking a
hen you sw
erformance P
e a higher AA
y second and
time model C
d still be som
uld be affecte
mance and To
a lot easier in
ager broken do
onents – wai
phing purpose
clude all of t
oming from v
advantage of A
witch from
age (pulls da
AS on the To
d it does tha
CPU although
me lag time an
ed by average
p Activity pa
n graphics. Th
own into “Wa
it class
es on the Perf
the “events”
v$system_eve
ASH (sample
the Real T
ata from v$sy
p Activity Pa
at quickly on
h it updates qu
nd it will stil
s.
age this is wo
he image belo
ait Class”.
fsheet to mak
so that all th
ent + v$syssta
es) and does i
Time 15 se
ystem_event +
age compared
n every activ
uicker (5secs
ll be based on
orth reading
ow shows tha
e
e
at
it
c
+
d
e
I
n
-
at
13. Even mor
graphs. B
is mostly
Ooops, do
uses could
view and
compare
chart view
Compare
it’s on the
Then com
happening
1.6 on SN
re, we have t
elow is broke
consuming th
on’t get too ex
d hide import
see the data
the above an
w could tell a
the wait clas
e range of 0.1
mpare the wai
g.. but on 3D
NAP_ID 335 a
the data now
en down into
he AAS.
Stack
xcited.. impor
tant informati
a clearly sepa
nd below ch
a more meanin
ss chart… ab
1 (hidden bet
it event char
you can see
and 336. Yes,
in our contro
“Wait Events
ked area chart
rtant reminde
ion and somet
arated into th
harts, you’ll k
ngful story.
bove notice th
tween CPU an
rt… notice th
that only the
, you will also
ol. So we cou
s”, aside from
t AAS compon
er… the 2-dim
times could b
heir respective
know what I
he blue (Othe
nd System IO
he big differen
db file sequ
o not be foole
uld play arou
m being more
ents – wait ev
mensional Sta
be misleading
e component
I mean.. Wa
er wait class)
O)… that’s a b
nce on the ch
ential read a
ed when you
und with the
colorful it let
vents
acked area cha
g [13] and it r
ts, rather than
ait Class and
on the range
big difference
hart? above y
and direct pa
look at the ra
data and crea
t’s you see wh
art that Enterp
really helps to
n being stack
d Wait Event
e of AAS of 1
e!
you can’t real
ath read are o
aw data… but
ate interesting
hat wait even
prise Manage
o have anothe
ked… As you
ts in 3D area
1 while below
lly tell what’
on the AAS o
t visualization
g
nt
er
er
u
a
w
s
of
n
14. A
is much e
AAS through
On my te
data. You
SNAP_ID
database.
beyond m
there you
asier and the
out the AWR
st machine I
u can see from
D 335-339) ha
You could a
my maximum
could use AS
way to go bu
3
3D
R retention pe
have 365 day
m the chart b
appens to be
also see the p
CPU which
SH, run the A
ut you must be
D area chart A
D area chart AA
eriod!
ys retention p
below (stacke
the highest l
period of shut
could justify
AWR report, ru
e able to sens
AAS componen
AS component
period. This e
ed area chart
oad period f
tdowns (nega
the drill dow
un ADDM, o
e and validate
nts – wait clas
ts – wait even
enables me to
t), that what
from all the A
ative value) a
wn on the spe
or make use o
e if it’s drivin
ss
nts
o have a data
we are focus
AAS samples
and other tim
ecific SNAP_
f your high ca
ng you to bad
warehouse of
sing on (6:20
for the lifetim
me period whe
_IDs or time
aliber scripts!
d conclusions.
f performanc
0 to 7:01 AM
me of my tes
ere AAS wen
frame… from
!
e
M
st
nt
m
15. PAR
U
The good
RT 4 - Capa
Utilization is
Capacity
expected
will fit in
measurem
and presen
Measuring
H
E
E
On the In
explained
Requirem
Essentiall
formula
Utilization =
As shown
water” an
decision t
into the s
server cap
much or i
thing here is
acity Plann
the ultimate m
planning pla
and unexpec
nto the availab
ment [7]. Goo
nt the in a mo
g the workloa
Have enough c
Enable us to qu
Enable us to qu
ntroduction to
d in detail wha
ments, and U
ly what we ca
Requirements / C
n on the imag
nd “another p
to purchase th
erver. And o
pacity. And w
t could be the
, you are not
ning
metric!
ays a very im
ted workload
ble capacity o
d thing the d
ore meaningfu
ad will give u
capacity and n
uantify the re
uantify the be
o Oracle Serv
at information
Utilization
are most in C
Capacity
ge below the
pitcher with b
he database s
f course, the
when this does
e other way ar
guessing!
mportant role
ds. The prima
of the databa
ata collection
ul and useful
us the followin
not over buy
esults of respo
enefit of work
er Consolidat
n you need to
Capacity Plann
“empty pitch
beer” are the
erver that is t
application r
sn’t occur nic
round where t
to ensure pr
ary principle
se server. An
n process is a
manner.
ng advantages
onse time opti
kload reductio
tion paper [6
o get for you
ning is the da
her” represent
Oracle work
they define th
requirement m
cely, there can
the capacity i
roper resourc
is to ensure
nd with this w
already being
s and benefits
imizations in
on
] and Chapte
to be able to
atabase server
ts the databas
kload require
he capacity. T
may or may
n be an exces
is not enough
es are availa
the applicati
we need to ha
done by AW
s [7]:
the savings o
er 9 of Craig
define the Da
r utilization a
se server capa
ements. Typic
Then they sta
not fit nicely
s of capacity,
h for the requi
able and be a
ion workload
ave a facility
WR. We just n
of system reso
Shallahamer’
atabase Serve
and it is repre
acity while th
cally the IT s
art pouring th
y on the avail
, which mean
irements at ha
able to handl
d requirement
y for workload
need to extrac
ources
’s book [8] h
er’s Capacity
esented by thi
he “glass with
shop makes
he application
lable databas
ns IT spent too
and.
e
s
d
ct
e
y,
s
h
a
s
e
o
16. This simp
presented
Having th
periods w
ple and very u
in a manner
he data presen
with high work
useful concep
that we can e
nted this way
kload requirem
pt can be app
asily abstract
y, we can easi
ments.
plied as well i
t the performa
ily apply filte
in AWR. Usi
ance statistics
er to the data
ing the awr_g
s to the Utiliz
set and imm
genwl.sql scr
zation formula
mediately find
ript the data i
a.
d the workload
s
d
17. C
And we c
AAS range
Per SNAP_ID
Oracle CPU U
OS CPU Utiliz
Particular Wo
AND TO_CHA
AND TO_CHA
AND TO_CHA
AND TO_CHA
AND s0.END_
AND s0.END_
CPU sizing re
Having th
The data
server is a
occurred.
needed to
The formu
core need = #
The data
collocated
can virtua
ould do other
aas > 1
or range of SNAP
id in (336)
where id >= 3
Utilization
oracpupct > 5
zation
oscpupct > 50
orkload periods
AR(s0.END_INTERV
AR(s0.END_INTERV
AR(s0.END_INTERV
AR(s0.END_INTERV
_INTERVAL_TIME
_INTERVAL_TIME
ecommendati
his data outpu
points below
a dual core m
The manage
handle the w
ula used to de
# of cores * utilizat
points were
d to a data cen
alize it to a ne
r filtering as w
P_IDs
36 and id <= 340
0
0
VAL_TIME,'D') >=
VAL_TIME,'D') <= 7
VAL_TIME,'HH24M
VAL_TIME,'HH24M
>= TO_DATE('2010
<= TO_DATE('2010
ions
ut can be easil
w came from a
machine and b
ement would
workload of th
erive the “CPU
tion * 1.25
very useful
nter, we could
ewer hardware
well…
1 ‐‐ Day of week
7
MI') >= 0900 ‐‐ H
MI') <= 1800
0‐jan‐17 00:00:00
0‐aug‐22 23:59:59
y used as inp
an actual pro
been used for
like to know
he database.
U core need”
to character
d opt to just u
e.
k: 1=Sunday 7=Sat
Hour
','yyyy‐mon‐dd hh
9','yyyy‐mon‐dd hh
uts to CPU si
oduction serve
r almost 8 yea
w what would
[9] is as follo
ize the curre
upgrade to a n
turday
h24:mi:ss') ‐‐ Dat
h24:mi:ss‘)
izing of a data
er that needs
ars and there
d be the ideal
ows:
ent utilization
newer model
ta range
abase server.
to be migrat
e have been a
l machine and
n of the data
but not the la
ted to a new
couple of ha
d how many
abase server.
atest and the g
machine. Th
ardware error
cores will b
Since it wa
greatest or w
e
s
e
s
e
18. S
But notice
summariz
ignore the
Validating
a year. H
process w
affect the
Storage sizing
Having th
e the outlier (
zing the data
e outlier just l
g with the app
Having this in
will run again
overall conne
g recommend
his data outpu
(shown in red
will tell me t
like that becau
plication own
nformation, w
on the new se
ected users.
dations
ut can be easil
d above) repre
that I’m most
use there mig
ner, she confi
we can safely
erver we just
y used as inp
esenting a SN
t of the time
ght be a critica
irmed that it w
remove the
have to make
uts to storage
NAP period ha
on the 10 %<
al application
was indeed an
outlier from
e sure that it’
e sizing of a d
aving high CP
< CPU utiliza
n process on th
n adhoc proce
the data poi
s being run o
database serve
PU utilization
ation but we
hat workload
ess that is bei
ints and even
on an off-peak
er.
n. Statistically
don’t want to
d period.
ing done onc
n if the adho
k period to no
y
o
e
c
ot
19. The data
mentioned
can be us
measured
Also take
determine
will help a
For storag
points below
d above. This
sed together w
data easily tr
note that the
e the right sto
a lot on the st
ge sizing purp
w came from
s shows the I
with a storag
ransforms req
re are other f
orage for a pa
torage decisio
poses, I strong
m awr_genw
IOPS requirem
e monitoring
quirements to
factors (bandw
articular IO w
ons.
gly recommen
wl.sql as wel
ments needed
tool to have
capacity.
width, through
workload but
nd using the a
l, sizing sto
d to run the d
e a complete
hput, service
this can be y
awr_iowl.sql
rage for the
database on t
picture of IO
time, etc.) th
your starting
same produ
the new envir
O performanc
hat need to be
point. Also b
uction system
ronment. Thi
ce. Having th
considered to
benchmarking
m
s
e
o
g
20. Rea
D
al World Ex
Diagnosing a
The graph
processing
done any
performan
So it’s a
plotted in
was able
visualizat
On this im
peaks are
suspect or
particular
and OS s
problem.
If it weren
This is the
xample
and Resolving
h shown was
g so it’s the m
changes on t
nce problem s
sudden slow
one graph…
to apply the
ion and I was
mage above y
e the particula
r possible cul
database ses
statistics (CPU
n’t for this vis
e image after
g GC Block L
a sudden slo
most critical w
the database e
so the tasks o
down, and I
that would a
e things that
s able to achie
you can see t
ar periods w
lprit for the p
ssions running
U, memory,
sualization th
replacing the
Lost
ow down on a
week of the m
environment…
f finding whe
I was thinkin
answer a lot o
t I have lear
eve what I hav
the where, wh
e are interest
performance p
g critical mod
network) we
he troubleshoo
e network inte
a client runni
month. Interv
… well that w
ere/when/why
ng… if I can
of questions.
ned. So I m
ve envisioned
hen, and why
ted in. And w
problem. Dril
dules that are
e were able c
oting would h
erconnect swi
ing 2 nodes o
viewing the D
would be the
y it went wron
have time se
Coming from
made use of P
d.
y. Most of th
what wait ev
lling down fu
e slow plus c
conclude that
have taken lon
itch… this sh
of RAC and
DBA, he wou
majority of th
ng is all left to
eries perform
m Tanel Pode
Perfsheet and
he load is on
vents are con
urther on thos
correlating it
t it was a ne
nger.
ows their nor
it’s a period
ld insist that
he customers
o us.
mance of both
r’s seminar in
d played aro
the first node
ntributing on
se peak perio
with the data
etwork interco
rmal workload
of month end
they have no
s will say on
h of the node
n Singapore,
ound with th
e. And on th
the peak is
ods and on th
abase advisor
onnect switch
d.
d
ot
a
s
I
e
e
a
e
s
h
21. LLinear Regres
Mining th
targeted re
The graph
8core HS2
respective
at >80% t
On the dri
high load
componen
when look
reduction,
If the serv
seems to b
ssion of AAS
he AWR back
esponse time
h shown below
21 Bladeserve
ely which sho
the AAS also
ill down show
SQL greatly
nt being utiliz
king at the SQ
, response tim
ver’s workloa
be low. Also
Nod
S and CPU on
ked by solid s
optimization
w is a scatter
er on a DS48
ows a strong c
shoots up!
wn below on t
affecting the
zed is on “CPU
QL details on
me optimizatio
ad is on the
you will notic
de 1
n 2 node RAC
statistical ana
ns and worklo
plot of a prod
00 SAN. Not
correlation be
the peak perio
overall perfo
U” hence you
awr_topsqlx.
on, and huge
AAS value o
ce the top SQ
C
alysis [10] [1
ad reduction.
duction enviro
tice the strong
tween AAS v
od with AAS
ormance of th
u will see larg
. Tuning the h
savings on sy
of 2.2, the CP
QL from AAS
1] [12] lets y
onment with
g correlation c
vs. CPU utiliz
value of 10 i
e database. A
ge LIOs and m
high load SQL
ystem resourc
PU utilization
of 10 is not t
you do foreca
2 nodes of 11
coefficient (R
zation. Also w
it shows that t
Also note that
most of the el
L will result t
ces.
n, latency, A
there anymore
Nod
ast that can gu
1gR1 RAC ru
R2) of .97 and
when CPU sta
the workload
the large chu
lapsed time sp
to great work
AAS compone
e.
de 2
uide you with
unning on
d .89
arts to queue
is driven by
unk of AAS
pent on CPU
kload
ent on “CPU
h
”
22. Drill
The perfo
database s
informed
The toolk
- CREAT
- DROP
- CREAT
- POPUL
- ANAL
- POPUL
- R2 REP
ing down o
1) General W
2) Tablespa
3) Top Ti
ormance toolk
server’s work
decisions and
kit contains 7
TE USER - c
TABLES - d
TE THE r2 T
LATE y data
YZE r2 VAL
LATE x and r
PORT - gene
on the peak
Workload repor
ace IO report
med Events
kit uses AWR
kload based on
d completely
sections, see b
reates the r2to
drop the tables
TABLES - cre
- y data is the
LUES - get the
residual data
rate the textu
workload...
rt
R data and Lin
n AAS. The d
avoiding gue
brief descript
oolkit user
s for a fresh s
eate the main
e "dependent
e stat names w
- x data is the
ual report and
. with AAS o
near Regressio
data points ca
sswork!
tion below:
tart
tables
value", variab
with high r2 v
e "independen
r2 values wit
of 10
on to identify
an be very use
ble whose va
values, to hav
nt value", use
th or w/o outl
y what metric/
eful for capac
lue is to be pr
ve a more accu
d to predict th
liers
/statistic is dri
city planning g
redicted
urate analysis
he value of y
iving the
giving you
s
23. Now
4) Top 20
6) Top 5 SQ
w on the low
0 SQLs
QLs of SNAP_ID
w workload
D 8631.. which b
period… wi
y the way got a
ith AAS of 2
n AAS of 10
2.2
24. Refe
1) Genera
2) Tables
3) Top Ti
4) Top 20
No entry – t
6) Top 5 SQ
erences
[1] Craig
[2] r2proj
[3] Kyle H
[4] Tanel
[5] Histor
[6] Craig
[7] Andy
[8] Craig
[9] Husnu
http://husn
[10] Forec
[11] Statis
al Workload rep
space IO report
med Events
0 SQLs
the top SQL fro
QLs on SNAP_ID
Shallahamer
ect - http://ka
Hailey Semin
Poder – Perfs
ry of session l
Shallahamer
Rivenes – Or
Shallahamer
u Sensoy - Da
nusensoy.file
casting Oracl
stics Without
port
m AAS of 10 is
D 8582
- Oracle Perf
arlarao.tiddlys
nar – AAS pre
sheet http://w
load - http://si
- Introduction
racle Workloa
- Oracle Perf
atabase Conso
s.wordpress.c
e Performanc
t Tears
not here anymo
formance Fire
spot.com/#r2p
esentation
www.tanelpod
ites.google.co
n To Oracle S
ad Measurem
formance Fire
olidation Best
com/2010/05/
ce
ore
efighting - Ch
project
der.com/files/P
om/site/youvi
Server Consol
ment
efighting - Ch
t Practices
/database-con
hapter 1
PerfSheet.zip
isualize/active
lidation
hapter 9
nsolidation-be
p
e-session-hist
est-practices.p
tory
pdf
25.
Ape
[12] Neer
[13] Neil
http://arxi
[14] AAS
Other refe
o ht
o St
o ht
o ht
endix - Ave
The IO lat
latency (ms) =
The imag
shorter s
http://ww
raj Bahatia – L
l Gunther &
iv.org/pdf/080
S investigation
erences:
ttp://karlarao.
torage IOPS,
ttp://karlarao.
ttp://karlarao.
erage Laten
tency formula
= (readtim / phy re
ges below sho
nap interval
w.freelists.or
Linear Regres
Tanel Poder
09.2532
n http://goo.gl
.wordpress.co
capacity, per
.tiddlyspot.co
.tiddlyspot.co
ncy Issue
a used in AW
eads) * 10
ow that latenc
ls. Also rea
g/post/oracle-
ssion Paper
r - Multidim
l/5WaAg
om
rformance, co
om/#Statistics
om/#OraclePe
WR is as follow
cy values ma
ad on this l
-l/Disk-Devic
mensional Vis
ost - http://goo
s
erformance
ws:
ay be normali
link for the
ce-Busy-Wha
sualization of
o.gl/FCN0w
ized if the sn
e effects of
at-exactly-is-t
f Oracle Per
nap interval i
CPU sched
this,7
rformance us
is too long as
duling issues
sing Barry007
s compared to
s on latency
7
o
y