Oracle database AWR performance repository is a hidden treasure. There are a lot of very useful details about your systems behavior hidden in that repository. This presentation designed to give you all knowledge you need to start leveraging the data more than standard AWR based reports allows you. The author will walk you through several practical examples from his experience where AWR proven to be one of the best information sources. You will learn how to start accessing AWR tables and few areas you should be careful with. We will wrap up the presentation with more examples and Q&A section.
Objective 1: Give enough information to start mining AWR tables to extract performance data for troubleshooting different issues
Objective 2: Demonstrate practical examples on how AWR has been used to troubleshoot different performance problems
Objective 3: Let you consider AWR as a good additional source for performance issues troubleshooting
7. Background
• AWR is one of many RDBMS performance data sources
• Sometimes it isn’t the best source (aggregation)
• SQL Extended trace (event 10046)
• RAW trace
• tkprof
• TRCAnlzr [ID 224270.1]
• Method-R state of art tools
• PL/SQL Profiler
• LTOM (Session Trace Collector)
• others
• Sometimes it is the best/efficient source!
• Sometimes it is the only one available!
8. Background
• Once I was called to troubleshoot high load
• Connected to the database I saw 8 active processes running for 6
hours in average
• Used 10046 event for all 8 processes for 15 minutes
• Found several SQLs returning 1 row million times
• Passed the results to development asking to fix the logic
• Spent ~2 hours to find where the issue was
• Next day a colleague asked me
• Why did you use 10046 and spent 2 hours?
• He used AWR report and came up with the same
answer in less than 5 minutes
• Lesson learned: Right tool for the right case !
9. When should you consider AWR mining?
• General resource tuning (high CPU, IO utilization)
• Find TOP resource consuming SQLs
• You are asked to reduce server load X times
• You would like to analyze load patterns/trends
• You need to travel back in time and see how things
progressed
• You don’t have any other source of performance information
• AWR report doesn’t provide you information at the right
angle/dimension or are not available (Grid Control,
awrrpt.sql)
• AWR SQL Execution Plans historical information analysis
10. When it is better to use other methods?
• You need to tune a procedure/function/activity
• You have a repeatable test case
• The problem could be repeated in an idle
environment
• There is no concurrent resource usage
• SQL Trace (10046) is way better troubleshooting method
in such cases
• When application doesn’t use bind variables
11. TOP CPU/IO Consuming SQLs ?
select
s.SQL_ID,
sum(CPU_TIME_DELTA),
sum(DISK_READS_DELTA),
count(*)
from
DBA_HIST_SQLSTAT
group by
SQL_ID
order by
sum(CPU_TIME_DELTA) desc
/
SQL_ID SUM(CPU_TIME_DELTA) SUM(DISK_READS_DELTA) COUNT(*)
------------- ------------------- --------------------- ----------
05s9358mm6vrr 27687500 2940 1
f6cz4n8y72xdc 7828125 4695 2
5dfmd823r8dsp 6421875 8 15
3h1rjtcff3wy1 5640625 113 1
92mb1kvurwn8h 5296875 0 1
bunssq950snhf 3937500 18 15
7xa8wfych4mad 2859375 0 2
...
12. TOP CPU Consuming SQLs ?
select
s.SQL_ID,
sum(s.CPU_TIME_DELTA),
sum(s.DISK_READS_DELTA),
count(*)
from
DBA_HIST_SQLSTAT s
group by
s.SQL_ID
order by
sum(s.CPU_TIME_DELTA) desc
13. TOP CPU Consuming SQLs ?
select * from
(
select
s.SQL_ID,
sum(s.CPU_TIME_DELTA),
sum(s.DISK_READS_DELTA),
count(*)
from
DBA_HIST_SQLSTAT s
group by
s.SQL_ID
order by
sum(s.CPU_TIME_DELTA) desc
)
where rownum < 11
/
14. TOP CPU Consuming SQLs ?
select * from
(
select
s.SQL_ID,
sum(s.CPU_TIME_DELTA),
sum(s.DISK_READS_DELTA),
count(*)
from
DBA_HIST_SQLSTAT s, DBA_HIST_SNAPSHOT p
where 1=1
and s.SNAP_ID = p.SNAP_ID
and EXTRACT(HOUR FROM p.END_INTERVAL_TIME) between 8 and 16
group by
s.SQL_ID
order by
sum(s.CPU_TIME_DELTA) desc
)
where rownum < 11
/
15. TOP CPU Consuming SQLs ?
select * from
(
select
s.SQL_ID,
sum(s.CPU_TIME_DELTA),
sum(s.DISK_READS_DELTA),
count(*)
from
DBA_HIST_SQLSTAT s, DBA_HIST_SNAPSHOT p
where 1=1
and s.SNAP_ID = p.SNAP_ID
and EXTRACT(HOUR FROM p.END_INTERVAL_TIME) between 8 and 16
and p.END_INTERVAL_TIME between SYSDATE-7 and SYSDATE
group by
s.SQL_ID
order by
sum(s.CPU_TIME_DELTA) desc
)
where rownum < 11
/
16. TOP CPU Consuming SQLs ?
select * from
(
select
s.SQL_ID,
sum(s.CPU_TIME_DELTA),
sum(s.DISK_READS_DELTA),
count(*)
from
DBA_HIST_SQLSTAT s, DBA_HIST_SNAPSHOT p, DBA_HIST_SQLTEXT t
where 1=1
and s.SNAP_ID = p.SNAP_ID
and s.SQL_ID = t.SQL_ID
and EXTRACT(HOUR FROM p.END_INTERVAL_TIME) between 8 and 16
and t.COMMAND_TYPE != 47 –- Exclude PL/SQL blocks from output
and p.END_INTERVAL_TIME between SYSDATE-7 and SYSDATE
group by
s.SQL_ID
order by
sum(s.CPU_TIME_DELTA) desc
)
where rownum < 11
/
20. AWR = DBA_HIST_% objects
• 223 => 11.2.0.4.0
• 243 => 12.1.0.1.0
• I use just few on a regular basis
• DBA_HIST_ACTIVE_SESS_HISTORY
• DBA_HIST_SEG_STAT
• DBA_HIST_SQLSTAT
• DBA_HIST_SQL_PLAN
• DBA_HIST_SYSSTAT
• DBA_HIST_SYSTEM_EVENT
• Most of the views contain data snapshots from V$___
views
• DELTA columns (e.g. DISK_READS_DELTA)
• DBA_HIST_SEG_STAT
• DBA_HIST_SQLSTAT
- V$ACTIVE_SESSION_HISTORY
- V$SEGMENT_STATISTICS
- V$SQL
- V$SQL_PLAN
- V$SYSSTAT ( ~SES~ )
- V$SYSTEM_EVENT ( ~SESSION~ )
21. AWR Things to keep in mind …
• The data are just snapshots of V$ views
• Data collected based on thresholds(default top 30)
• Some data is excluded based on thresholds
• Some data may not be in SGA at the time of
snapshot
• Longer time difference between snapshots
more data got excluded
• For data mining use ALL snapshots available
Begin
End
t
22. AWR Things to keep in mind …
• Forget about AWR if there are literals in the code
• Indicator is high parse count (hard) (10-50 per/sec)
• cursor_sharing = FORCE (use very carefully)
• In RAC configuration do not forget INST_ID column in joins
• Most of the V$ (DBA_HIST) performance views have incremental
counters. END - BEGIN values
• You may get wrong results (sometimes negative)
• Sometimes counters reach max value and get reset
• Counters got reset at instance restart time
• Time between snapshots may be different
• Suggestion (ENDv - BEGINv)/(ENDs - BEGINs)=value/sec
24. AWR Things to keep in mind …
• Seconds count between 2 snapshots
select
s.BEGIN_INTERVAL_TIME,
s.END_INTERVAL_TIME,
s.END_INTERVAL_TIME-s.BEGIN_INTERVAL_TIME DTIME, -- Returns “Interval”
EXTRACT(HOUR FROM s.END_INTERVAL_TIME-s.BEGIN_INTERVAL_TIME) H,
EXTRACT(MINUTE FROM s.END_INTERVAL_TIME-s.BEGIN_INTERVAL_TIME) M,
EXTRACT(SECOND FROM s.END_INTERVAL_TIME-s.BEGIN_INTERVAL_TIME) S,
EXTRACT(HOUR FROM s.END_INTERVAL_TIME-s.BEGIN_INTERVAL_TIME)*60*60+
EXTRACT(MINUTE FROM s.END_INTERVAL_TIME-s.BEGIN_INTERVAL_TIME)*60+
EXTRACT(SECOND FROM s.END_INTERVAL_TIME-s.BEGIN_INTERVAL_TIME) SECS,
phy_get_secs(s.END_INTERVAL_TIME,s.BEGIN_INTERVAL_TIME), -– Write you own fun()
(cast(s.END_INTERVAL_TIME as date) - cast(s.BEGIN_INTERVAL_TIME as date))
*24*60*60
from
DBA_HIST_SNAPSHOT s
where 1=1
and s.INSTANCE_NUMBER = (select INSTANCE_NUMBER from V$INSTANCE)
and s.DBID = (select DBID from V$DATABASE)
order by
s.BEGIN_INTERVAL_TIME;
25. AWR Things to keep in mind …
select SNAP_INTERVAL, RETENTION
from
DBA_HIST_WR_CONTROL c, V$DATABASE d
where
c.DBID = d.DBID;
SNAP_INTERVAL RETENTION
------------------------------ ------------------------------
+00000 01:00:00.0 +00007 00:00:00.0
select DBID, INSTANCE_NUMBER, count(*) C,
min(BEGIN_INTERVAL_TIME) OLDEST, max(BEGIN_INTERVAL_TIME) YUNGEST
from
DBA_HIST_SNAPSHOT
group by
DBID,
INSTANCE_NUMBER;
DBID INSTANCE_NUMBER C OLDEST YOUNGEST
---------- --------------- ---------- ------------------------- -------------------------
3244685755 1 179 13-AUG-13 07.00.30.233 PM 21-AUG-13 05.00.01.855 AM
3244685755 2 179 13-AUG-13 07.00.30.309 PM 21-AUG-13 05.00.01.761 AM
26. Trends Analysis Example (1) …
select
s.BEGIN_INTERVAL_TIME, s.END_INTERVAL_TIME,
(
t.VALUE-
LAG (t.VALUE) OVER (ORDER BY s.BEGIN_INTERVAL_TIME)
) DVALUE,
(t.VALUE-LAG (t.VALUE) OVER (ORDER BY s.BEGIN_INTERVAL_TIME))/
phy_get_secs(s.END_INTERVAL_TIME, s.BEGIN_INTERVAL_TIME) VAL_SEC
from
DBA_HIST_SNAPSHOT s,
DBA_HIST_SYSSTAT t
where 1=1
and s.SNAP_ID = t.SNAP_ID
and s.DBID = t.DBID
and s.INSTANCE_NUMBER = t.INSTANCE_NUMBER
and s.INSTANCE_NUMBER = (select INSTANCE_NUMBER from V$INSTANCE)
and s.DBID = (select DBID from V$DATABASE)
and t.STAT_NAME = 'parse count (hard)'
order by
s.BEGIN_INTERVAL_TIME;
DBA_HIST_SYSSTAT & DBA_HIST_SYSTEM_EVENT
28. select
s.BEGIN_INTERVAL_TIME, s.END_INTERVAL_TIME,
(
t.VALUE-
LAG (t.VALUE) OVER (ORDER BY s.END_INTERVAL_TIME)
) DVALUE,
(t.VALUE-LAG (t.VALUE) OVER (ORDER BY s.END_INTERVAL_TIME))/
phy_get_secs(s.END_INTERVAL_TIME-s.BEGIN_INTERVAL_TIME) VAL_SEC
from
DBA_HIST_SNAPSHOT s,
DBA_HIST_SYSSTAT t
where 1=1
and s.SNAP_ID = t.SNAP_ID
and s.DBID = t.DBID
and s.INSTANCE_NUMBER = t.INSTANCE_NUMBER
and s.INSTANCE_NUMBER = (select INSTANCE_NUMBER from V$INSTANCE)
and s.DBID = (select DBID from V$DATABASE)
and t.STAT_NAME = 'parse count (hard)'
order by
s.END_INTERVAL_TIME;
DBA_HIST_SYSSTAT & DBA_HIST_SYSTEM_EVENT
Trends Analysis Example (1) …
29. SQL Bad performance Example (2) …
• Called by a user to troubleshoot a badly performing SQL
• Sometimes the SQL hangs (never finishes) and needs to be killed
and re-executed
• Upon re-execution, it always finishes successfully in a few
minutes
• The client demanded a resolution ASAP …
30. select
st.SQL_ID
, st.PLAN_HASH_VALUE
, sum(st.EXECUTIONS_DELTA) EXECUTIONS
, sum(st.ROWS_PROCESSED_DELTA) CROWS
, trunc(sum(st.CPU_TIME_DELTA)/1000000/60) CPU_MINS
, trunc(sum(st.ELAPSED_TIME_DELTA)/1000000/60) ELA_MINS
from DBA_HIST_SQLSTAT st
where st.SQL_ID in (
'5ppdcygtcw7p6'
,'gpj32cqd0qy9a'
)
group by st.SQL_ID , st.PLAN_HASH_VALUE
order by st.SQL_ID, CPU_MINS;
DBA_HIST_SQLSTAT
SQL Bad performance Example (2) …
32. select
st.SQL_ID
, st.PLAN_HASH_VALUE
, sum(st.EXECUTIONS_DELTA) EXECUTIONS
, sum(st.ROWS_PROCESSED_DELTA) CROWS
, trunc(sum(st.CPU_TIME_DELTA)/1000000/60) CPU_MINS
, trunc(sum(st.ELAPSED_TIME_DELTA)/1000000/60) ELA_MINS
from DBA_HIST_SQLSTAT st
where st.SQL_ID in (
'5ppdcygtcw7p6'
,'gpj32cqd0qy9a'
)
group by st.SQL_ID , st.PLAN_HASH_VALUE
order by st.SQL_ID, CPU_MINS;
DBA_HIST_SQLSTAT
SQL Bad performance Example (2) …
33. • In the result …
• Two different jobs were gathering statistics on a daily basis
1. “ANALYZE …” part of other batch job (developer)
2. “DBMS_STATS…” traditional (DBA)
• Sometimes “DBMS_STATS…“ did not complete before the
batch job starts (+/- 10 minutes).
• After the job got killed (typically after 10 min since it started) the
new “correct” statistics were in place.
• Takeaways …
A. Don’t change your statistics that frequently (should be consistent)
B. AWR data helps to spot such issues easily
SQL Bad performance Example (2) …
34. SQL Plan flipping Example (3) …
• I asked myself: Well !
• If we find that the execution plan for one SQL has changed
from a good (fast) to a bad one (slow), are there other SQLs
affected by an issue alike?
• And if there are, how many are there?
• Would SQL Profiles (baselines, outlines) help address
those?
35. SELECT st2.SQL_ID ,
st2.PLAN_HASH_VALUE ,
st_long.PLAN_HASH_VALUE l_PLAN_HASH_VALUE ,
st2.CPU_MINS ,
st_long.CPU_MINS l_CPU_MINS ,
st2.ELA_MINS ,
st_long.ELA_MINS l_ELA_MINS ,
st2.EXECUTIONS ,
st_long.EXECUTIONS l_EXECUTIONS ,
st2.CROWS ,
st_long.CROWS l_CROWS ,
st2.CPU_MINS_PER_ROW ,
st_long.CPU_MINS_PER_ROW l_CPU_MINS_PER_ROW
FROM
(SELECT st.SQL_ID ,
st.PLAN_HASH_VALUE ,
SUM(st.EXECUTIONS_DELTA) EXECUTIONS ,
SUM(st.ROWS_PROCESSED_DELTA) CROWS ,
TRUNC(SUM(st.CPU_TIME_DELTA) /1000000/60) CPU_MINS ,
DECODE( SUM(st.ROWS_PROCESSED_DELTA), 0 , 0 , (SUM(st.CPU_TIME_DELTA)/1000000/60)/SUM(st.ROWS_PROCESSED_DELTA) ) CPU_MINS_PE R_ROW ,
TRUNC(SUM(st.ELAPSED_TIME_DELTA) /1000000/60) ELA_MINS
FROM DBA_HIST_SQLSTAT st
WHERE 1 =1
AND ( st.CPU_TIME_DELTA !=0
OR st.ROWS_PROCESSED_DELTA !=0)
GROUP BY st.SQL_ID,
st.PLAN_HASH_VALUE
) st2,
(SELECT st.SQL_ID ,
st.PLAN_HASH_VALUE ,
SUM(st.EXECUTIONS_DELTA) EXECUTIONS ,
SUM(st.ROWS_PROCESSED_DELTA) CROWS ,
TRUNC(SUM(st.CPU_TIME_DELTA) /1000000/60) CPU_MINS ,
DECODE( SUM(st.ROWS_PROCESSED_DELTA), 0 , 0 , (SUM(st.CPU_TIME_DELTA)/1000000/60)/SUM(st.ROWS_PROCESSED_DELTA) ) CPU_MINS_PE R_ROW ,
TRUNC(SUM(st.ELAPSED_TIME_DELTA) /1000000/60) ELA_MINS
FROM DBA_HIST_SQLSTAT st
WHERE 1 =1
AND ( st.CPU_TIME_DELTA !=0
OR st.ROWS_PROCESSED_DELTA !=0)
HAVING TRUNC(SUM(st.CPU_TIME_DELTA)/1000000/60) > 10
GROUP BY st.SQL_ID,
st.PLAN_HASH_VALUE
) st_long
WHERE 1 =1
AND st2.SQL_ID = st_long.SQL_ID
AND st_long.CPU_MINS_PER_ROW/DECODE(st2.CPU_MINS_PER_ROW,0,1,st2.CPU_MINS_PER_ROW) > 2
ORDER BY l_CPU_MINS DESC,
st2.SQL_ID,
st_long.CPU_MINS DESC,
st2.PLAN_HASH_VALUE;
SQL Plan flipping Example (3) …
36. SELECT
...
FROM
(SELECT st.SQL_ID ,
st.PLAN_HASH_VALUE ,
...
DECODE( SUM(st.ROWS_PROCESSED_DELTA), 0 , 0 ,
(SUM(st.CPU_TIME_DELTA)/1000000/60)/SUM(st.ROWS_PROCESSED_DELTA) ) CPU_MINS_PER_ROW ,
...
FROM DBA_HIST_SQLSTAT st
WHERE 1 =1
...
GROUP BY st.SQL_ID,
st.PLAN_HASH_VALUE
) st2,
(SELECT st.SQL_ID ,
st.PLAN_HASH_VALUE ,
...
HAVING trunc(sum(st.CPU_TIME_DELTA)/1000000/60) > 10
GROUP BY st.SQL_ID,
st.PLAN_HASH_VALUE
) st_long
WHERE 1 =1
AND st2.SQL_ID =
st_long.SQL_ID
AND st_long.CPU_MINS_PER_ROW/DECODE(st2.CPU_MINS_PER_ROW,0,1,st2.CPU_MINS_PER_ROW) > 2
ORDER BY l_CPU_MINS DESC,
st2.SQL_ID,
st_long.CPU_MINS DESC,
st2.PLAN_HASH_VALUE;
SQL Plan flipping Example (3) …
38. • In the result …
• Load on the system was reduced by 5 times
• Takeaways …
A. SQL Plans may flip from good plans to …
B. SQL Outlines/Profiles may help some times
C. AWR provides good input for such analysis
• Why SQL Plans may flip?
1. Bind variable peeking / adaptive cursor sharing
2. Statistics change (including difference in partitions stats)
3. Adding/Removing indexes
4. Session/System init.ora parameters (nls_sort/optimizer_mode)
5. Dynamic statistics gathering (sampling)
6. Profiles/Outlines/Baselines evolution
SQL Plan flipping Example (3) …
39. • AWR = DBA_HIST% views ( snapshots from V$% views )
• Sometimes it is the only source of information
• AWR contains much more information that default AWR reports
and Grid Control could provide you
• Be careful mining data (there are some gotchas)
• Don’t be afraid to discover/mine the AWR data
I can show you the door …
… but it is you who should walk through it
Conclusions …
40. Additional Resources
• www.oracle.com/scan
• www.pythian.com/exadata
• www.pythian.com/news/tag/exadata - Exadata
Blog
• www.pythian.com/news_and_events/in_the_news
Article: “Making the Most of Oracle Exadata”
My Oracle Support notes 888828.1 and 757552.1
Thank you!
Mission
Let you remember/consider AWR
next time you troubleshoot
Performance issue!
Google careers