Anibal.Garcia@Atos.net
#AgarciaDBA
Over 18 years of experience in Oracle , started in 1996.
Vice-President Oracle User group in Guatemala since 2014
Member of LAOUC ( Latin America Oracle User
Community) Since 2012
Distinguished Member of
Oracle Hispanic community Latin America
(http://comunidadoraclehispana.ning.com)
Wrote Co-Blog Partner Fernando Garcia –Argentina Oracle Ace
(https://oracleconprecision.wordpress.com/about)
…..In summary a self study
Person like you
PROPERTIES OF PERFORMANCE
P O I N T O F V I E W
( E N D U S E R )
• They expected some
benefit in business
perspective.
• They always will confirm in
their system the result of
our work.
• Measurement is based on
a perception of historical
performance.
P O I N T O F V I E W
( T E C H N I C A L U S E R )
• Optimize system resources
according to the request of
user.
• Need to be logged and
compared with a SLA
• Measurement is based on
a Response Time.
THROUGHPUT INDICATORS
Improve Throughput means optimizing the components and create load
balancing of resources to relieve the containment . A strategy include
indicators for both and a method to focus your work across different
components in order to achieve a better Performance (decrease response
time)
TUNING EVERYTHING ?
The challenge of it is achieve the expectations of performance that users define
as acceptable, the lack of a consistent result in the operation, despite the
introduction of new system components or growth of customer requests, result
a demand user to tune the system.
THE GOLD PIECE
(AVG RESPONSE TIME)
The tests of scalability and performance monitoring shows in detail each
component of the system pointing to the average response time. The goal is
to achieve average response time acceptable complying with the user (SLA)
and have the least variation so it is predictable
Now , you need to use Average, variance, deviation , index of variation
(VMR) to observe how much predictable is a process.
Average ( or Mean) 24
Variance 313.1111
Dev Standard 17.69495
Variance to mean ration (VMR) 13.0463
The mean is similar in this case , but the dispersion of data is very
different
Example 2
Example 1
> 1 skewed far
away of the mean.
HOW PRECISELY IS THIS TOOL ?
The change on mean
do not dramatically
change, but the index
of dispersion give us
an idea something
happen
You can
inspect the
detail around
2PM
WHERE DO YOU NEED TUNING ?
ANALYZE THE CASES ?
AVG RESPONSE TIME FOR THE SAME QUERY
Easy to see
what is Max
Limit ,
(Tolerance)
More than 50% is below of mean,
candidate to review why sometimes
take to long to finish.
More than 50% is
over of Mean,
candidate to review
the execution plan
DEMO 1 ( JOBS) ORACLE COLLECTED ALL THE
TIMES
create table job_timings_test1
as
select job_name,
count(job_name) executions,
round(avg(elapsed_time),3) average,
round(median(elapsed_time),3) median,
round(variance(elapsed_time),3) variance,
round(variance(elapsed_time)/avg(elapsed_time),3) vmr
from job_history
where start_dat_time > '20013-12-31 23:59:59'
and start_dat_time < '2014-01-01 00:00:01'
and elapsed_time != 0
group by job_name
order by job_name ;
ROBUST PERFORMANCE
Taguchi Methods for Robust Design by Yuin Wu
and Alan Wu (ASME Press, 2000) defines robustness as follows:
… the condition used to describe a product or process
design that functions with limited variability in spite of
diverse and changing environmental conditions, wear, or
component-tocomponent variation. A product or process
is robust when it has limited or reduced functional
variation even in the presence of noise.
https://en.wikipedia.org/wiki/Taguchi_methods
ROBUST PERFORMANCE
Uncontrollable variables are referred to as noise factors. There are three types of noise
factors, and they are usually explained in manufacturing or product design terms.
However, these types can be easily associated to the “uncontrollable” variables that
affect database systems:
•
External causes: Users or operators, any system issue external to the database
•
Internal causes: Data growth or changes to code within the database
• Component-to-component factors: System resource contention or systemwide changes
that affect more than the intended process
USUAL APPROACH TO SEE PERFORMANCE
CREATE YOUR OWN VERSION TO REVIEW
PERFORMANCE
(LOGGED THE ELAPSED TIME)
Go time: DBMS_UTILITY.get_time (at start)
Stop time: DBMS_UTILITY.get_time (at stop)
Elapsed time: Stop time–go time
CPU go time: DBMS_UTILITY.get_cputime (at start)
CPU stop time: DBMS_UTILITY.get_cputime (at stop)
CPU elapsed time: CPU stop time–CPU go time
UNDERSTANDING THE METRIC IN ORACLE
FIND SLOWEST AND NOT PREDICTABLE
QUERIES
Column ct (Count times executed) = Bucket 1 + Bucket2 + Bucket3 + Bucket4 + Bucket5
Example
SQL_ID=1kahr9dw1rqbu ,executed = 3942 times, Avg = 7 Segs
Bucket1 = [0..98 Segs ] 3862 times 97% , Bucket5 [396-490 Segs] 20 times
Less predictable of SQL_ID 06amz8tcpwtuz and the VMR show this.
DEMO 2 ( BACKUPS)
Select sid,start_time,end_time
From V$RMAN_STATUS
where OPERATION='BACKUP'
select ss.snap_id, ss.instance_number node, begin_interval_time, sql_id,
plan_hash_value,
nvl(executions_delta,0) execs,
(elapsed_time_delta/decode(nvl(executions_delta,0),0,1,executions_delta))/10
00000 avg_etime,
from DBA_HIST_SQLSTAT S, DBA_HIST_SNAPSHOT SS
where sql_id = ccyax0n7jfa7j'
and ss.snap_id = S.snap_id
and ss.instance_number = S.instance_number
and executions_delta > 0
order by 1, 2, 3
DEMO 3 ( QUERY ELAPSED TIME)
SUMMARY
WHY WE NEED CREATE THIS METRICS?
working on the measuring the system, we also gain more knowledge how
works everything.
We get the opportunity to test our hypothesis and deny the speculations of
other groups with a precarious position ( The application not change / is
almost the same size of data / not a large variation of users …etc)
A good DBA do not work with assumptions , analyze the processes using
the evidence in the metrics.
QUESTIONS
Anibal.Garcia@Atos.net
#AgarciaDBA
You want more Demos ?
• Network Activity
• Spike of Connections
• Balance workload in Rac
• Bottleneck in Memory/CPU
• …. etc

Crating a Robust Performance Strategy

  • 1.
  • 2.
    Over 18 yearsof experience in Oracle , started in 1996. Vice-President Oracle User group in Guatemala since 2014 Member of LAOUC ( Latin America Oracle User Community) Since 2012 Distinguished Member of Oracle Hispanic community Latin America (http://comunidadoraclehispana.ning.com) Wrote Co-Blog Partner Fernando Garcia –Argentina Oracle Ace (https://oracleconprecision.wordpress.com/about) …..In summary a self study Person like you
  • 3.
    PROPERTIES OF PERFORMANCE PO I N T O F V I E W ( E N D U S E R ) • They expected some benefit in business perspective. • They always will confirm in their system the result of our work. • Measurement is based on a perception of historical performance. P O I N T O F V I E W ( T E C H N I C A L U S E R ) • Optimize system resources according to the request of user. • Need to be logged and compared with a SLA • Measurement is based on a Response Time.
  • 4.
    THROUGHPUT INDICATORS Improve Throughputmeans optimizing the components and create load balancing of resources to relieve the containment . A strategy include indicators for both and a method to focus your work across different components in order to achieve a better Performance (decrease response time)
  • 5.
    TUNING EVERYTHING ? Thechallenge of it is achieve the expectations of performance that users define as acceptable, the lack of a consistent result in the operation, despite the introduction of new system components or growth of customer requests, result a demand user to tune the system.
  • 6.
    THE GOLD PIECE (AVGRESPONSE TIME) The tests of scalability and performance monitoring shows in detail each component of the system pointing to the average response time. The goal is to achieve average response time acceptable complying with the user (SLA) and have the least variation so it is predictable Now , you need to use Average, variance, deviation , index of variation (VMR) to observe how much predictable is a process.
  • 7.
    Average ( orMean) 24 Variance 313.1111 Dev Standard 17.69495 Variance to mean ration (VMR) 13.0463 The mean is similar in this case , but the dispersion of data is very different Example 2 Example 1 > 1 skewed far away of the mean.
  • 8.
    HOW PRECISELY ISTHIS TOOL ? The change on mean do not dramatically change, but the index of dispersion give us an idea something happen You can inspect the detail around 2PM
  • 9.
    WHERE DO YOUNEED TUNING ?
  • 10.
    ANALYZE THE CASES? AVG RESPONSE TIME FOR THE SAME QUERY Easy to see what is Max Limit , (Tolerance) More than 50% is below of mean, candidate to review why sometimes take to long to finish. More than 50% is over of Mean, candidate to review the execution plan
  • 11.
    DEMO 1 (JOBS) ORACLE COLLECTED ALL THE TIMES create table job_timings_test1 as select job_name, count(job_name) executions, round(avg(elapsed_time),3) average, round(median(elapsed_time),3) median, round(variance(elapsed_time),3) variance, round(variance(elapsed_time)/avg(elapsed_time),3) vmr from job_history where start_dat_time > '20013-12-31 23:59:59' and start_dat_time < '2014-01-01 00:00:01' and elapsed_time != 0 group by job_name order by job_name ;
  • 12.
    ROBUST PERFORMANCE Taguchi Methodsfor Robust Design by Yuin Wu and Alan Wu (ASME Press, 2000) defines robustness as follows: … the condition used to describe a product or process design that functions with limited variability in spite of diverse and changing environmental conditions, wear, or component-tocomponent variation. A product or process is robust when it has limited or reduced functional variation even in the presence of noise. https://en.wikipedia.org/wiki/Taguchi_methods
  • 13.
    ROBUST PERFORMANCE Uncontrollable variablesare referred to as noise factors. There are three types of noise factors, and they are usually explained in manufacturing or product design terms. However, these types can be easily associated to the “uncontrollable” variables that affect database systems: • External causes: Users or operators, any system issue external to the database • Internal causes: Data growth or changes to code within the database • Component-to-component factors: System resource contention or systemwide changes that affect more than the intended process
  • 14.
    USUAL APPROACH TOSEE PERFORMANCE
  • 15.
    CREATE YOUR OWNVERSION TO REVIEW PERFORMANCE (LOGGED THE ELAPSED TIME) Go time: DBMS_UTILITY.get_time (at start) Stop time: DBMS_UTILITY.get_time (at stop) Elapsed time: Stop time–go time CPU go time: DBMS_UTILITY.get_cputime (at start) CPU stop time: DBMS_UTILITY.get_cputime (at stop) CPU elapsed time: CPU stop time–CPU go time
  • 16.
  • 17.
    FIND SLOWEST ANDNOT PREDICTABLE QUERIES Column ct (Count times executed) = Bucket 1 + Bucket2 + Bucket3 + Bucket4 + Bucket5 Example SQL_ID=1kahr9dw1rqbu ,executed = 3942 times, Avg = 7 Segs Bucket1 = [0..98 Segs ] 3862 times 97% , Bucket5 [396-490 Segs] 20 times Less predictable of SQL_ID 06amz8tcpwtuz and the VMR show this.
  • 18.
    DEMO 2 (BACKUPS) Select sid,start_time,end_time From V$RMAN_STATUS where OPERATION='BACKUP' select ss.snap_id, ss.instance_number node, begin_interval_time, sql_id, plan_hash_value, nvl(executions_delta,0) execs, (elapsed_time_delta/decode(nvl(executions_delta,0),0,1,executions_delta))/10 00000 avg_etime, from DBA_HIST_SQLSTAT S, DBA_HIST_SNAPSHOT SS where sql_id = ccyax0n7jfa7j' and ss.snap_id = S.snap_id and ss.instance_number = S.instance_number and executions_delta > 0 order by 1, 2, 3 DEMO 3 ( QUERY ELAPSED TIME)
  • 19.
    SUMMARY WHY WE NEEDCREATE THIS METRICS? working on the measuring the system, we also gain more knowledge how works everything. We get the opportunity to test our hypothesis and deny the speculations of other groups with a precarious position ( The application not change / is almost the same size of data / not a large variation of users …etc) A good DBA do not work with assumptions , analyze the processes using the evidence in the metrics.
  • 20.
    QUESTIONS Anibal.Garcia@Atos.net #AgarciaDBA You want moreDemos ? • Network Activity • Spike of Connections • Balance workload in Rac • Bottleneck in Memory/CPU • …. etc