0
Oracle Analysis 101Simple techniques to helpanalyze performance     • Glenn.Fawcett@Sun.com            > http://blogs.sun....
Goal Statements              Introduce basic techniques that are             required to better collect and analyze       ...
Overview Collecting Data      • Developing a well defined problem statement.      • Define types of performance data and w...
Developing a problem statement       • Be as specific as possible using business         metrics:              > Warehouse...
CPU is not a workload metric!        • Consider the following:                       > Upgrade from Older v880 server runn...
Types of Performance Data          • Environmental                > Configuration (HW, OS, Network, IO, and DB)           ...
Scoped and Correlated Data     • Focus on data around the event            > I once received a STATSPACK where the report ...
DTRACE - Cool, but not the best    place to start!     • Treats Oracle as a BLACK box.     • Can identify resource consume...
Oracle Performance data     • STATSPACK introduced in 8.1.6           >     Replaced tired bstat/estat           >     Wor...
Overview Analysis     •     Basic techniques     •     Environmental (logfiles and configuration)     •     STATSPACK / AW...
Basic techniques      ●    Start with a well defined problem      ●           Look for high-level signs of problems       ...
Alert.log analysis      • Startup time and messages.        > Restart frequency.        > init.ora hacking shows up “_unde...
AWR / Statspack Analysis 101     • GOAL:       > Give basic guidance when looking at an AWR or         STATSPACK report.  ...
HEADER for Statspack/AWR    • A fair amount of information can be squeezed just from the      header.                     ...
Scoping issues    • Example #1. Can you find the issue?     WORKLOAD REPOSITORY report for     DB Name         DB Id    In...
Scoping issues cont...    • Example #2: Whats wrong with this sample?STATSPACK report for                       30min coll...
Oracle Cache Sizes    • Shows Default Buffer cache, shared pool, recycle, ..    • Caches use IPC shared memory.      > “ip...
Load Profile    • How many transactions/sec?    • IO profile? Query profile?     Load Profile     ~~~~~~~~~~~~            ...
Load Profile: Apples and Oranges!    • As “Joe the DBA” might say:                 – “Nothings changed”                 – ...
Load Profile... warning signs    • High physical IO rate.    • Hard parses... should primarily be soft parses.    • High “...
Instances Efficiency Percentages    • Buffer Hit rate      > Values below 99% are suspect for OLTP.    • Shared Pool “% SQ...
Top 5 Timed Events     • Wait events       > Shows where “Oracle” connections wait.       > Bad problems usually show up h...
CPU time in Oracle    • Total amount of CPU seconds during the sample interval.      > CPU is typically one of the top sta...
Drill down on Expensive SQL     • Which SQL is using the most CPU?       > Allows you to quickly locate expensive SQL stat...
Problem wait events      • “enq”, “buffer busy”, “latch free”.. Often a sign of too many        connections or application...
Problem wait events... “log file sync”      • Too many connections lead to scheduling issues.      • Rarely an IO issue......
IO wait events      • You can get avg wait for IO from the Top 5 events.        > Oracles statistic: “db file sequential r...
More IO information...      • Reads by Tablespace, Datafile, SQL statement,Tablespace IO Stats               DB/Inst: ITMS...
Even More IO information...      • Reads by SQL statement, Database objectSQL ordered by Reads                        DB/I...
“Who needs iostat?”       • IO rate information from the Load Profile          > physical reads/writes per second       • ...
Case Study: Oracle Applications BM      • Benchmark for DIT (India IRS :)      • Configuration            > E20K with 36 U...
Case Study: STATSPACK Data     • STATSPACK data showed severe latch contention             Top 5 Timed Events             ...
Case Study: Oracle Trace Top-level     • Using Oracle event trace allowed us to narrow our focus and       concentrate on ...
Case Study: Oracle Trace “latch free”    • Drilling down again on statements which contribute the most to      “latch free...
Case Study: Summary    • OS showed 100% CPU utilization, but no anomalies. DTRACE      was not helpful here either.    • S...
Oracle Resources    • http://metalink.oracle.com - Oracles Metalink      > Need an account. Check oracle-interest@sun.com ...
More references and resources    • metalink.oracle.com documents on Trace           >     245981.1 – Trace wait functional...
Summary     • Identify and define the problem     • Collect and identify Oracle performance data            > Alert.log   ...
Questions?????Oracle Analysis 101   ●    Glenn.Fawcett@Sun.com       http://blogs.sun.com/~glennf   Sr. Staff Engineer    ...
Extra slides...Oracle Analysis 101   ●    Glenn.Fawcett@Sun.com       http://blogs.sun.com/~glennf   Sr. Staff Engineer   ...
Where is the Oracle Data?     • Alert logs & Trace Files                  $ORACLE_HOME/rdbms/log      ##Default     • Opti...
Using STATSPACK     • Install package from $ORACLE_HOME/rdbms/admin                  SQL> connect / as sysdba             ...
Using Automatic Workload Repository     • AWR installation automatic as part of 10g.     • Snaphot                 SQL> co...
Show Query plans     • Further drill down with ?/rdbms/admin/awrrepsql.sql       > Get full stats and QEP given the hash v...
Show Query plans ( cont...)------------------------------------------------| Operation                      | PHV/Object N...
Drill down on Object Statistics      • Object statistics...            > Which objects are doing the most IO?            >...
Using the Trace Wait Interface    • Oracle tracing is a lot like truss or Dtrace for the database.          > What is a pa...
Response Time Profiling Trace Data     • Collect *.trc file as previously shown via oradebug or ???     • Analyze files wi...
Upcoming SlideShare
Loading in...5
×

Oracle analysis 101_v1.0_ext

1,233

Published on

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
1,233
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
65
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Transcript of "Oracle analysis 101_v1.0_ext"

  1. 1. Oracle Analysis 101Simple techniques to helpanalyze performance • Glenn.Fawcett@Sun.com > http://blogs.sun.com/~glennf > Sr. Staff Engineer > Performance Technologies Grouporacle_analysis_101 (12/9/08) Glenn.Fawcett@sun.com Page 1
  2. 2. Goal Statements Introduce basic techniques that are required to better collect and analyze Oracle performance data.oracle_analysis_101 (12/9/08) Glenn.Fawcett@sun.com Page 2
  3. 3. Overview Collecting Data • Developing a well defined problem statement. • Define types of performance data and what is important. • Minimal set of data required for performance engagements. • Data quality – properly scoped and collected. • Show techniques to gather various types of performance data from Oracle. > Basic STATSPACK and Automatic Workload Repository (AWR) capabilities > Gathering Oracle Trace Dataoracle_analysis_101 (12/9/08) Glenn.Fawcett@sun.com Page 3
  4. 4. Developing a problem statement • Be as specific as possible using business metrics: > Warehouse Inventory user response time increases from 1 to 10 seconds during peak hours (10AM to 1PM PST). > The Fulfillment batch job has increased from 1 hour to 2 hours over the past month. • Avoid defining performance in terms of system metrics. > System cpu% has increased from 10% to 25% during peak hours. > This may be an indication of a potential problem or Future problem. This by itself is NOT a problem. Just a symptom.oracle_analysis_101 (12/9/08) Glenn.Fawcett@sun.com Page 4
  5. 5. CPU is not a workload metric! • Consider the following: > Upgrade from Older v880 server running Solaris 8. > New server m4000 on Solaris 10. • CPU% on new server at 60% during peak vs old system at 50% during peak. > Panic!!! The new server cant possibly handle any growth!! > Escalations ensue, people flap their arms, executive get involved... you get the picture :) • Observations > Need real metrics like “orders/hr”, etc... > CPU% is not a workload metric or a measure of throughput. > Solaris 8 often under-reports CPU% vs Solaris 10.... use Tim Cooks utility and blog: http://blogs.sun.com/timc/entry/how_event_driven_utilization_measurementoracle_analysis_101 (12/9/08) Glenn.Fawcett@sun.com Page 5
  6. 6. Types of Performance Data • Environmental > Configuration (HW, OS, Network, IO, and DB) > Event/Error logs (“messages” and “alert_xx.log”) > System Run logs or ECOs. • High Level statistics (Be sure to scope the data!) > Business metrics: Orders/min, Shipments/sec, ... > iostat, netstat, vmstat, mount, prstat, ps -ecf, ... (guds?) > Oracle STATSPACK or AWR • Low Level statistics > mpstat, trapstat, cpustat, lockstat, DTrace > Event 10046 tracing in Oracle.oracle_analysis_101 (12/9/08) Glenn.Fawcett@sun.com Page 6
  7. 7. Scoped and Correlated Data • Focus on data around the event > I once received a STATSPACK where the report spanned 36 hours ☺ > Avoid data-overload... I recently received 2GB of trace files > Averages have a funny way of distorting problems and pointing you in the wrong direction. > User response time and business metrics • OS and Database statistics should be from the SAME interval. > Often I see an Explorer from midnight with some utilization data paired up with a STATSPACK from the afternoon.oracle_analysis_101 (12/9/08) Glenn.Fawcett@sun.com Page 7
  8. 8. DTRACE - Cool, but not the best place to start! • Treats Oracle as a BLACK box. • Can identify resource consumers, but can NOT tell if this behavior is correct or not. • STATSPACK or AWR can provide DB stats overview • Oracle Event Tracing is best for deep drill-down.. the “Dtrace” of Oracle.oracle_analysis_101 (12/9/08) Glenn.Fawcett@sun.com Page 8
  9. 9. Oracle Performance data • STATSPACK introduced in 8.1.6 > Replaced tired bstat/estat > Workload profiling with Persistent storage of perf data > More detailed latch and shared pool data > Finds HOT SQL statements to aid in SQL tuning. • Automated Workload Repository (AWR) in 10g > HTML output!, Remote capabilities, sort by CPU and Elasped time. • Trace Wait interface > Enhanced in 10g > Trace individual processes/sessions via “oradebug”oracle_analysis_101 (12/9/08) Glenn.Fawcett@sun.com Page 9
  10. 10. Overview Analysis • Basic techniques • Environmental (logfiles and configuration) • STATSPACK / AWR overview • Oracle Event Tracing (The “DTrace” of Oracle)oracle_analysis_101 (12/9/08) Glenn.Fawcett@sun.com Page 10
  11. 11. Basic techniques ● Start with a well defined problem ● Look for high-level signs of problems – alert.log – STATSPACK/AWR: (1st page stats) Ÿ Load profile Ÿ top wait events Ÿ Hit rates – Top SQL CPU consumers in AWR reports Oracle Performance analys" takes years to ma#er... so be patient.oracle_analysis_101 (12/9/08) Glenn.Fawcett@sun.com Page 11
  12. 12. Alert.log analysis • Startup time and messages. > Restart frequency. > init.ora hacking shows up “_underbar_params” > Restart frequency • Errors are reported to the alert.log file. • Log file switch frequency. Tue Aug 30 14:01:22 2005 Starting ORACLE instance (normal) Startup LICENSE_MAX_SESSION = 0 LICENSE_SESSIONS_WARNING = 0 message. Picked latch-free SCN scheme 3 .... SYS auditing is disabled Starting up ORACLE RDBMS Version: 10.1.0.2.0. Log switches every ..... Mon Nov 28 14:39:26 2005 71 seconds!! Private_strands 3 at log switch Beginning log switch checkpoint up to RBA [0x19d.2.10], SCN: 0x0000.00478e91 Thread 1 advanced to log sequence 413 Current log# 1 seq# 413 mem# 0: /export/home/oracle/oradata/GLENNF/redo01.log Mon Nov 28 14:40:37 2005 Private_strands 3 at log switch Beginning log switch checkpoint up to RBA [0x19e.2.10], SCN: 0x0000.00478ead Thread 1 advanced to log sequence 414 Current log# 2 seq# 414 mem# 0: /export/home/oracle/oradata/GLENNF/redo02.logoracle_analysis_101 (12/9/08) Glenn.Fawcett@sun.com Page 12
  13. 13. AWR / Statspack Analysis 101 • GOAL: > Give basic guidance when looking at an AWR or STATSPACK report. • Answer basic questions like: > What is the scope of the data collected? > Is this RAC or single instance? > How many connections? > What is the transaction rate? > IO rate? Cache hit rate? > How much CPU is being used? > What SQL is using the most CPU, IO?oracle_analysis_101 (12/9/08) Glenn.Fawcett@sun.com Page 13
  14. 14. HEADER for Statspack/AWR • A fair amount of information can be squeezed just from the header. RAC cluster 650+ connections... Sample interval Shadow Processesoracle_analysis_101 (12/9/08) Glenn.Fawcett@sun.com Page 14
  15. 15. Scoping issues • Example #1. Can you find the issue? WORKLOAD REPOSITORY report for DB Name DB Id Instance Inst Num Release RAC Host ------------ ----------- ------------ -------- ----------- --- ------------ PROD 4060419904 PROD2 2 10.2.0.3.0 YES thdtoltpr02 Snap Id Snap Time Sessions Curs/Sess --------- ------------------- -------- --------- Begin Snap: 25738 15-May-08 09:00:12 828 15.6 End Snap: 25744 15-May-08 13:00:73 832 15.6 Elapsed: 240.86 (mins) DB Time: 2405.07 (mins) Notice Gap in Snap IDs? 4 hour window?? Oracle by default schedules AWR by the hour.oracle_analysis_101 (12/9/08) Glenn.Fawcett@sun.com Page 15
  16. 16. Scoping issues cont... • Example #2: Whats wrong with this sample?STATSPACK report for 30min collectionDB Name DB Id Instance intervalNum good. Inst is Release Cluster Host------------ ----------- ------------ -------- ----------- ------- ------------SWINGBCH 861079668 SWINGBCH 1 9.2.0.6.0 NO dc1-beta Snap Id Snap Time Sessions Curs/Sess Comment --------- ------------------ -------- --------- -------------------Begin Snap: 221 27-Apr-07 02:00:06 14 48.6 End Snap: 223 27-Apr-07 02:30:07 3,017 34.5 Elapsed: 30.02 (mins) This is an application startup phase. 3000 sessions were added in the 30min interval!!oracle_analysis_101 (12/9/08) Glenn.Fawcett@sun.com Page 16
  17. 17. Oracle Cache Sizes • Shows Default Buffer cache, shared pool, recycle, .. • Caches use IPC shared memory. > “ipcs -mb” shows segments from OS point of view > “pmap -xs <orapid>” shows pages and sizes from OS point view With DISM, caches can grow and shrinkCache Sizes~~~~~~~~~~~ Begin End ---------- ---------- Buffer Cache: 5,712M 5,712M Std Block Size: 8K Shared Pool Size: 2,864M 2,864M Log Buffer: 14,376K Oracle block size. 8K is the safest by far. All development and optimizer work is with 8K.oracle_analysis_101 (12/9/08) Glenn.Fawcett@sun.com Page 17
  18. 18. Load Profile • How many transactions/sec? • IO profile? Query profile? Load Profile ~~~~~~~~~~~~ Per Second Per Transaction --------------- --------------- Redo size: 14,529,454.45 506,509.90 Logical reads: 154,624.04 5,390.33 Block changes: 45,862.25 1,598.80 Physical reads: 196.92 6.86 Physical writes: 794.24 27.69 User calls: 148.29 5.17 Parses: 34.47 1.20 Hard parses: 0.00 0.00 Sorts: 15.67 0.55 Logons: 0.29 0.01 Executes: 98.55 3.44 Transactions: 28.69 % Blocks changed per Read: 29.66 Recursive Call %: 48.51 Rollback per transaction %: 0.02 Rows per Sort: 1137.18oracle_analysis_101 (12/9/08) Glenn.Fawcett@sun.com Page 18
  19. 19. Load Profile: Apples and Oranges! • As “Joe the DBA” might say: – “Nothings changed” – “Its the same application” • Verify it is the same... The truth is in the DATA! • Key metrics: Logical IO, Physical IO, Transaction profile. Load Profile ~~~~~~~~~~~~ Per Second Per Transaction --------------- --------------- Redo size: 14,529,454.45 506,509.90 Logical reads: 154,624.04 5,390.33 Block changes: 45,862.25 1,598.80 Physical reads: 196.92 6.86 Physical writes: 794.24 27.69 User calls: 148.29 5.17 Parses: 34.47 1.20 Hard parses: 0.00 0.00 Sorts: 15.67 0.55 Logons: 0.29 0.01 Executes: 98.55 3.44 Transactions: 28.69 % Blocks changed per Read: 29.66 Recursive Call %: 48.51 Rollback per transaction %: 0.02 Rows per Sort: 1137.18oracle_analysis_101 (12/9/08) Glenn.Fawcett@sun.com Page 19
  20. 20. Load Profile... warning signs • High physical IO rate. • Hard parses... should primarily be soft parses. • High “Logons/sec”... use persistent connections! Load Profile ~~~~~~~~~~~~ Per Second Per Transaction --------------- --------------- Redo size: 1,282,493.19 2,192.82 Logical reads: 1,104,645.30 1,888.74 Block changes: 9,286.08 15.88 Physical reads: 48,975.96 0.01 Physical writes: 11,983.33 0.37 User calls: 484.33 0.83 Parses: 79.70 0.14 Hard parses: 0.14 0.00 Sorts: 6.74 0.01 Logons: 1.56 0.00 Executes: 4,375.60 7.48 Transactions: 584.86 % Blocks changed per Read: 0.84 Recursive Call %: 97.13 Rollback per transaction %: 1.30 Rows per Sort: 527.7oracle_analysis_101 (12/9/08) Glenn.Fawcett@sun.com Page 20
  21. 21. Instances Efficiency Percentages • Buffer Hit rate > Values below 99% are suspect for OLTP. • Shared Pool “% SQL with exec > 1” > low values mean poor reuse of shared statements > SQL without bind variables.. Instance Efficiency Percentages (Target 100%) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Buffer Nowait %: 99.99 Redo NoWait %: 100.00 Buffer Hit %: 98.92 In-memory Sort %: 100.00 Library Hit %: 100.15 Soft Parse %: 99.87 Execute to Parse %: 98.15 Latch Hit %: 99.81 Parse CPU to Parse Elapsd %: 93.41 % Non-Parse CPU: 99.89 Shared Pool Statistics Begin End ------ ------ Memory Usage %: 68.00 68.06 % SQL with executions>1: 98.40 95.83 % Memory for SQL w/exec>1: 96.84 94.28oracle_analysis_101 (12/9/08) Glenn.Fawcett@sun.com Page 21
  22. 22. Top 5 Timed Events • Wait events > Shows where “Oracle” connections wait. > Bad problems usually show up here first. > This is an Average of all sessions, so treat it as such. > This is a good sample of the TOP 5 events > CPU and IO are the top events.Top 5 Timed Events Avg %Total~~~~~~~~~~~~~~~~~~ wait CallEvent Waits Time (s) (ms) Time Wait Class------------------------------ ------------ ----------- ------ ------ ----------CPU time 3,641 60.6db file sequential read 268,976 1,375 5 22.9 User I/Ogc cr grant 2-way 218,866 384 2 6.4 Clusterlog file sync 12,625 131 10 2.2 Commitgc current block 2-way 61,056 130 2 2.2 Cluster -------------------------------------------------------------oracle_analysis_101 (12/9/08) Glenn.Fawcett@sun.com Page 22
  23. 23. CPU time in Oracle • Total amount of CPU seconds during the sample interval. > CPU is typically one of the top stats... along with IO. > Can calculate CPU utilization! > Useful for consolidation since only the CPU time for this instance is considered. Snap Id Snap Time Sessions Curs/Sess --------- ------------------- -------- --------- Begin Snap: 7380 07-Nov-08 20:00:43 1,375 72.0 End Snap: 7382 07-Nov-08 20:30:59 1,361 71.9 Elapsed: 30.27 (mins) DB Time: 395.62 (mins)Top 5 Timed Events Avg %Total~~~~~~~~~~~~~~~~~~ wait CallEvent Waits Time (s) (ms) Time Wait Class------------------------------ ------------ ----------- ------ ------ ----------CPU time 14,845 62.5db file sequential read 1,146,234 8,873 8 37.4 User I/Odb file scattered read 21,784 545 25 2.3 User I/Oread by other session 53,589 244 5 1.0 User I/O 14845/(30.27*60) = 8.17 CPUs busy for “usr” time.oracle_analysis_101 (12/9/08) Glenn.Fawcett@sun.com Page 23
  24. 24. Drill down on Expensive SQL • Which SQL is using the most CPU? > Allows you to quickly locate expensive SQL statements... but beware, this might not be the problem :) CPU Elapsd Buffer Gets Executions Gets per Exec %Total Time (s) Time (s) Hash Value--------------- ------------ -------------- ------ -------- --------- ---------- 18,641,061 2,645 7,047.7 39.9 369.37 372.72 3894562395Module: JDBC Thin Clientinsert into PLANARRIV (item, source, dest, transmode, needarrivdate, schedarrivdate, needshipdate, schedshipdate, expdate, qty,firmplansw, seqnum, substqty, departuredate, deliverydate, orderplacedate, sourcing ) values ( :1, :2, :3, :4, :5, :6, :7, :8,:9, :10, :11, :12, :13, :14, :15, :16, :17 ) 6,867,117 377 18,215.2 14.7 66.63 94.12 1924417985Module: JDBC Thin ClientSELECT sku.item,sku.loc,item.perishablesw,loc.ohpost,sku.oh,sku.ohpost,sku.replentype,skudemandparam.alloccal,skudemandparam.ccpsw,skudemandparam.custorderdur,skudemandparam.dmdredid,skudemandparam.dmdtodate,skudemandparam.fcstadjrule,skudemandparam.fcstconsumptionrule,skudemandparam.fcstprimconsdur,skudemandparam.fcsoracle_analysis_101 (12/9/08) Glenn.Fawcett@sun.com Page 24
  25. 25. Problem wait events • “enq”, “buffer busy”, “latch free”.. Often a sign of too many connections or application problems.Top 5 Timed Events Avg %Total~~~~~~~~~~~~~~~~~~ wait CallEvent Waits Time (s) (ms) Time Wait Class------------------------------ ------------ ----------- ------ ------ ----------enq: TX - row lock contention 427,949 157,270 367 26.2 ApplicatioCPU time 113,999 19.0gc buffer busy 3,642,184 95,627 26 16.0 Clustergc current block busy 2,264,273 76,874 34 12.8 Clusterdb file scattered read 351,146 30,238 86 5.0 User I/OEvent Waits Time (s) (ms) Time Wait Class------------------------------ ------------ ----------- ------ ------ ----------gc buffer busy 556,725 487,263 875 89.7 Clusterdb file sequential read 10,814 9,982 923 1.8 User I/Oenq: HW - contention 7,313 899 123 0.2 ConfiguratCPU time 852 0.2gc current multi block request 901 796 883 0.1 Cluster % TotalEvent Waits Time (s) Ela Time-------------------------------------------- ------------ ----------- --------latch free 4,542,675 1,137,914 79.04log file sync 242,359 164,671 11.44buffer busy waits 102,540 61,887 4.30enqueue 35,142 42,498 2.95CPU time 25,310 1.76oracle_analysis_101 (12/9/08) Glenn.Fawcett@sun.com Page 25
  26. 26. Problem wait events... “log file sync” • Too many connections lead to scheduling issues. • Rarely an IO issue.... but check Log file io just in case. • 2ms or less is desirable • Many bugs... Use 10.2.0.4.. (Checksum bug #6814520 in 10.2.0.3)Top 5 Timed Events Avg %Total~~~~~~~~~~~~~~~~~~ wait CallEvent Waits Time (s) (ms) Time Wait Class------------------------------ ------------ ----------- ------ ------ ----------log file sync 107,090 82,401 769 29.1 Commitenq: HW - contention 78,617 29,060 370 10.3 Configuratdb file sequential read 25,928 24,612 949 8.7 User I/Ogc buffer busy 7,803 5,906 757 2.1 ClusterFurther down the AWR you see all wait events... Avg %Time Total Wait wait WaitsEvent Waits -outs Time (s) (ms) /txn---------------------------- -------------- ------ ----------- ------- ---------log file sync 107,090 77.4 82,401 769 4.5enq: HW - contention 78,617 73.7 29,060 370 3.3......log file sequential read 3,975 .0 86 22 0.2log file parallel write 27,333 .0 86 3 1.1oracle_analysis_101 (12/9/08) Glenn.Fawcett@sun.com Page 26
  27. 27. IO wait events • You can get avg wait for IO from the Top 5 events. > Oracles statistic: “db file sequential read” – Storage centric view: “Random single block IO” > Oracles statistic: “db file scattered read” – Storage centric view : “Sequential IO”... HUH?Top 5 Timed Events Avg %Total~~~~~~~~~~~~~~~~~~ wait CallEvent Waits Time (s) (ms) Time Wait Class------------------------------ ------------ ----------- ------ ------ ----------CPU time 17,186 75.3db file sequential read 744,522 5,874 8 25.7 User I/Odb file scattered read 23,809 459 19 2.0 User I/Ooracle_analysis_101 (12/9/08) Glenn.Fawcett@sun.com Page 27
  28. 28. More IO information... • Reads by Tablespace, Datafile, SQL statement,Tablespace IO Stats DB/Inst: ITMSCMP/itscr11p Snaps: 6785-6788-> ordered by IOs (Reads + Writes) descTablespace------------------------------ Av Av Av Av Buffer Av Buf Reads Reads/s Rd(ms) Blks/Rd Writes Writes/s Waits Wt(ms)-------------- ------- ------ ------- ------------ -------- ---------- ------DATA_TS 477,801 180 8.1 2.6 99,555 37 7,016 5.5INDEX_TS 186,082 70 8.3 1.0 64,924 24 30,214 0.9Tablespace Filename------------------------ ---------------------------------------------------- Av Av Av Av Buffer Av Buf Reads Reads/s Rd(ms) Blks/Rd Writes Writes/s Waits Wt(ms)-------------- ------- ------ ------- ------------ -------- ---------- ------AMG_ALBUM_IDX_TS /oradata/itmscmp/data2/amg_album_idx_ts01.dbf 392 0 7.0 1.0 5 0 0 0.0AMG_ALBUM_TS /oradata/itmscmp/data3/amg_album_ts01.dbf 7,604 3 7.4 1.0 5 0 2 10.0oracle_analysis_101 (12/9/08) Glenn.Fawcett@sun.com Page 28
  29. 29. Even More IO information... • Reads by SQL statement, Database objectSQL ordered by Reads DB/Inst: TTOPERF1/ttoperf15 Snaps: 5141-5142-> Total Disk Reads: 318,079-> Captured SQL account for 133.5% of Total Reads CPU ElapsedPhysical Reads Executions per Exec %Total Time (s) Time (s) SQL Id-------------- ----------- ------------- ------ -------- --------- ------------- 811,212 1 811,212.0 51.4 538.11 1013.10 2j2g639a9s4kxModule: sqlplus@itscontentrepdb05 (TNS V1-V3)select /*+ parallel(ppc, 2) */ count(distinct p.adam_id) from mz_playlist p, mz_playlist_price_cache ppc where p.first_production_release is not null and p.last_production_release is null and p.playlist_id=ppc.playlist_id and (ppc.start_date is NULL or ppc.start_date <= sysdate) and (ppc.end_date is NULL or ppc.end_daSegments by Physical Reads DB/Inst: ITMSCMP/itscr11p Snaps: 6785-6788-> Total Physical Reads: 1,577,615-> Captured Segments account for 81.5% of Total Tablespace Subobject Obj. PhysicalOwner Name Object Name Name Type Reads %Total---------- ---------- -------------------- ---------- ----- ------------ -------CONTENT_OW DATA_TS MZ_PLAYLIST_PRICE_CA TABLE 723,411 45.85CONTENT_OW DATA_TS MZ_PLAYLIST__LS TABLE 87,947 5.57CONTENT_OW DATA_TS MZ_USER_REVIEW TABLE 79,534 5.04CONTENT_OW DATA_TS MZ_PRODUCT__LS TABLE 52,580 3.33CONTENT_OW DATA_TS MZ_PODCAST_EPISODE_2 TABLE 43,243 2.74oracle_analysis_101 (12/9/08) Glenn.Fawcett@sun.com Page 29
  30. 30. “Who needs iostat?” • IO rate information from the Load Profile > physical reads/writes per second • IO service time(s) from wait events • IO broken down by Tablespace and Datafile, etc.. • Seriously, who needs it? • Sorry, you still need “iostat”. > Like the CPU wait events, IO events are only from this instance. > Times arent accurate on an over-processed system. > iostat from the system point of view > “storage level” analytics are useful as well! > They often dont match due to > IO configuration and layout > Schedulingoracle_analysis_101 (12/9/08) Glenn.Fawcett@sun.com Page 30
  31. 31. Case Study: Oracle Applications BM • Benchmark for DIT (India IRS :) • Configuration > E20K with 36 USIV @1200MHz > Solaris 10 with Oracle 9iR2 • Oracle Statistics > STATSPACK > Event trace • Problem Statement: > Unable to support more than 2000 users within 2 second average response time. The goal is 4000 users. At 2000 users the system is fully utilized 100% cpu.oracle_analysis_101 (12/9/08) Glenn.Fawcett@sun.com Page 31
  32. 32. Case Study: STATSPACK Data • STATSPACK data showed severe latch contention Top 5 Timed Events ~~~~~~~~~~~~~~~~~~ % Total Event Waits Time (s) Ela Time -------------------------------------------- ------------ ----------- -------- latch free 10,597,141 1,425,538 97.52 CPU time 25,842 1.77 row cache lock 105,066 4,235 .29 enqueue 7,065 2,438 .17 buffer busy waits 23,785 2,195 .15 • Drill down by CPU, IO, etc... didnt show the problem. CPU Elapsd Buffer Gets Executions Gets per Exec %Total Time (s) Time (s) Hash Value --------------- ------------ -------------- ------ -------- --------- ---------- 264,391,557 50,560 5,229.3 44.7 1799.33 3566.29 3184176672 Module: f90runm@sleepy (TNS V1-V3) SELECT ROWID,SEQ_NO,IND_STAT,BNDL_AREA_CD,BNDL_AO_TYP,BNDL_RANGE _CD,BNDL_AO_NO,BNDL_FIN_YR,BNDL_CNTR_NO,BNDL_SEQ_NO,ACK_NO,AST_Y R,PAN,DT_FILED,NAME,RET_INC FROM SS_RETURN WHERE (SEQ_NO IN (SEL ECT a.SEQ_NO FROM ss_return a WHERE A.RANGE_CD = :1 AND A.AO_NO = :2 AND A.AO_TYP = :3 AND A.area_cd = :4)) and (AST_YR=:5) and 75,312,641 113,269 664.9 12.7 1063.39 1451.77 3785480933 select max(nvl(option$,0)) from sysauth$ where privilege#=:1 con nect by grantee#=prior privilege# and privilege#>0 start with (g rantee#=:2 or grantee#=1) and privilege#>0 group by privilege#oracle_analysis_101 (12/9/08) Glenn.Fawcett@sun.com Page 32
  33. 33. Case Study: Oracle Trace Top-level • Using Oracle event trace allowed us to narrow our focus and concentrate on the true bottle-neck. • Gathered several *.trc files and used “orasrp” to analyze. • Drilled down on “latch free” events as shown in profile below...oracle_analysis_101 (12/9/08) Glenn.Fawcett@sun.com Page 33
  34. 34. Case Study: Oracle Trace “latch free” • Drilling down again on statements which contribute the most to “latch free” shows an interesting pattern with the “dual” table... a well known problem in Oracle 9i.oracle_analysis_101 (12/9/08) Glenn.Fawcett@sun.com Page 34
  35. 35. Case Study: Summary • OS showed 100% CPU utilization, but no anomalies. DTRACE was not helpful here either. • STATSPACK provided starting point of problem. • Oracle Trace interface and “response-time” profiling pinned down the source of the problem. • Researched “dual” table problem on-line (metalink) > Problem is fixed in 10g > Trick / workaround for 9i. > Re-coding to avoid is Best!!oracle_analysis_101 (12/9/08) Glenn.Fawcett@sun.com Page 35
  36. 36. Oracle Resources • http://metalink.oracle.com - Oracles Metalink > Need an account. Check oracle-interest@sun.com archives for latest. > Research bugs, tech tips, download patches, ... • http://technet.oracle.com – Oracles Technet > Documentation, white papers, ... • http://asktom.oracle.com Misc questions mostly dba but some perf • http://www.oraperf.com - Analyzer for STATSPACK files!! • http://oracledba.ru/orasrp/ - Oracle Session profiler. • http://method-r.com/ - Great papers and insight – Cary Millsap • http://www.orapub.com - Papers, advice, ... • Nasty bug for 10.2.0.3 : Checksum bug #6814520oracle_analysis_101 (12/9/08) Glenn.Fawcett@sun.com Page 36
  37. 37. More references and resources • metalink.oracle.com documents on Trace > 245981.1 – Trace wait functionality in 10g > 21154.1 – Enabling Tracing (session level) > 1058210.6 – Enabling Tracing ORADEBUG > 39817.1 – Interpreting Raw trace data • Oracle papers > Avoiding Common Oracle Performance Problems > http://www.sun.com/blueprints/0303/817-1781.pdf • Sun Blogs > Oracle performance on Sun > http://blogs.sun.com/glennf > Tim Cooks Solaris 8,9,10 CPU% blog and “old-new” utility. > http://blogs.sun.com/timc/entry/how_event_driven_utilization_measurementoracle_analysis_101 (12/9/08) Glenn.Fawcett@sun.com Page 37
  38. 38. Summary • Identify and define the problem • Collect and identify Oracle performance data > Alert.log > STATSPACK > Oracle Tracing and analysis • Know when to say when > Use experts to help guide analysis. > Avoid Google performance hackers.oracle_analysis_101 (12/9/08) Glenn.Fawcett@sun.com Page 38
  39. 39. Questions?????Oracle Analysis 101 ● Glenn.Fawcett@Sun.com http://blogs.sun.com/~glennf Sr. Staff Engineer Performance Technologies Group
  40. 40. Extra slides...Oracle Analysis 101 ● Glenn.Fawcett@Sun.com http://blogs.sun.com/~glennf Sr. Staff Engineer Performance Technologies Group
  41. 41. Where is the Oracle Data? • Alert logs & Trace Files $ORACLE_HOME/rdbms/log ##Default • Optimal Flexible Architecture (OFA) is common to manage multiple instances > Places Files files in set location to ease administration. > User Trace and Alert.log found in: “USER_DUMP_DEST” init.ora over-rides Default. “BACKGROUND_DUMP_DEST” for server files... Including the “alert.log” file. • Full OFA documentation http://www.hotsos.com/e-library/abstract.php?id=19oracle_analysis_101 (12/9/08) Glenn.Fawcett@sun.com Page 41
  42. 42. Using STATSPACK • Install package from $ORACLE_HOME/rdbms/admin SQL> connect / as sysdba SQL> @?/rdbms/admin/spcreate ## Usually not necessary • Take snapshots throughout the day. Often an hourly job. SQL> connect perfstat/perfstat SQL> exec statspack.snap(i_snap_level=>7); ... ... run workload ... ... SQL> exec statspack.snap(i_snap_level=>7); • Run “spreport.sql” and select two intervals SQL> @?/rdbms/admin/spreport ## Run report • init.ora “statistics_level=ALL” > Necessary to get details about Query plans and Segment statistics.oracle_analysis_101 (12/9/08) Glenn.Fawcett@sun.com Page 42
  43. 43. Using Automatic Workload Repository • AWR installation automatic as part of 10g. • Snaphot SQL> connect / as sysdba SQL> exec dbms_workload_repository.create_snapshot(); ...run test.... SQL> exec dbms_workload_repository.create_snapshot(); • Run “@?/rdbms/admin/awrrpt” and select two snapshots.oracle_analysis_101 (12/9/08) Glenn.Fawcett@sun.com Page 43
  44. 44. Show Query plans • Further drill down with ?/rdbms/admin/awrrepsql.sql > Get full stats and QEP given the hash value of statement SQL Statistics ~~~~~~~~~~~~~~ -> CPU and Elapsed Time are in seconds (s) for Statement Total and in milliseconds (ms) for Per Execute % Snap Statement Total Per Execute Total --------------- --------------- ------ Buffer Gets: 6,867,117 18,215.2 14.71 Disk Reads: 3,887 10.3 6.54 Rows processed: 378,635 1,004.3 CPU Time(s/ms): 67 176.7 Elapsed Time(s/ms): 94 249.6 Sorts: 377 1.0 Parse Calls: 0 .0 Invalidations: 0 Version count: 1 Sharable Mem(K): 346 Executions: 377oracle_analysis_101 (12/9/08) Glenn.Fawcett@sun.com Page 44
  45. 45. Show Query plans ( cont...)------------------------------------------------| Operation | PHV/Object Name | Rows | Bytes| Cost |--------------------------------------------------------------------------------|SELECT STATEMENT |----- 1966240984 ----| | | 11671 ||SORT ORDER BY | | 974 | 253K| 11671 || NESTED LOOPS OUTER | | 974 | 253K| 11649 || NESTED LOOPS | | 960 | 231K| 9729 || NESTED LOOPS | | 951 | 192K| 7827 || NESTED LOOPS | | 952 | 168K| 5923 || NESTED LOOPS | | 956 | 124K| 4011 || NESTED LOOPS | | 979 | 84K| 2053 || HASH JOIN | | 1K| 54K| 43 || HASH JOIN | | 1K| 46K| 22 || TABLE ACCESS BY INDEX R|PROCESSSKU | 1K| 27K| 10 || INDEX RANGE SCAN |PROCESSSKU_BATCH | 1K| | 4 || TABLE ACCESS FULL |LOC | 1K| 23K| 11 || TABLE ACCESS FULL |ITEM | 10K| 87K| 20 || TABLE ACCESS BY INDEX ROW|SKU | 1 | 32 | 2 || INDEX UNIQUE SCAN |SKU_PK | 1 | | 1 || TABLE ACCESS BY INDEX ROWI|SKUPLANNINGPARAM | 1 | 45 | 2 || INDEX UNIQUE SCAN |XPKSKUPLANNINGPARAM | 1 | | 1 || TABLE ACCESS BY INDEX ROWID|SKUDEMANDPARAM | 1 | 48 | 2 || INDEX UNIQUE SCAN |XPKSKUDEMANDPARAM | 1 | | 1 || TABLE ACCESS BY INDEX ROWID |SKUDEPLOYMENTPARAM | 1 | 26 | 2 || INDEX UNIQUE SCAN |XPKSKUDEPLOYMENTPARA | 1 | | 1 || TABLE ACCESS BY INDEX ROWID |SKUSAFETYSTOCKPARAM | 1 | 40 | 2 || INDEX UNIQUE SCAN |XPKSKUSAFETYSTOCKPAR | 1 | | 1 || TABLE ACCESS BY INDEX ROWID |SKUPERISHABLEPARAM | 1 | 20 | 2 || INDEX UNIQUE SCAN |XPKSKUPERISHABLEPARA | 1 | | 1 |--------------------------------------------------------------------------------oracle_analysis_101 (12/9/08) Glenn.Fawcett@sun.com Page 45
  46. 46. Drill down on Object Statistics • Object statistics... > Which objects are doing the most IO? > Which objects get the most Buffer Busy Waits? Subobject Obj. Physical Owner Tablespace Object Name Name Type Reads %Total ---------- ---------- -------------------- ---------- ----- ------------ ------- STSC INDX DFUTOSKUFCST_PK INDEX 16,784 28.24 STSC DATA DFUTOSKUFCST TABLE 14,792 24.89 STSC DATA SKUPLANNINGPARAM TABLE 5,412 9.11 STSC DATA SOURCING TABLE 3,644 6.13 STSC DATA SKUSAFETYSTOCKPARAM TABLE 2,923 4.92 ------------------------------------------------------------- Buffer Subobject Obj. Busy Owner Tablespace Object Name Name Type Waits %Total ---------- ---------- -------------------- ---------- ----- ------------ ------- STSC DATA PLANARRIV TABLE 95,897 94.22 STSC INDX PLANARRIV_PK INDEX 3,999 3.93 STSC DATA SKU TABLE 466 .46 STSC INDX XIF4RECSHIP INDEX 391 .38 STSC DATA RECSHIP TABLE 347 .34 -------------------------------------------------------------oracle_analysis_101 (12/9/08) Glenn.Fawcett@sun.com Page 46
  47. 47. Using the Trace Wait Interface • Oracle tracing is a lot like truss or Dtrace for the database. > What is a particular “shadow” process doing? (SQL statements, wait events, ...) > Trace produces *.trc file in udump directory. (Post process with HOTSOS profiler or ORASRP) SQL> connect / as sysdba SQL> oradebug setospid 5544 SQL> oradebug event 10046 trace name context forever, level 12 ...wait for a while... SQL> oradebug event 10046 trace name context off ==ora_5544.trc file==== EXEC #2:c=0,e=324,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=4,tim=677730703911 WAIT #2: nam=db file sequential read ela= 5954 p1=1 p2=15356 p3=1 WAIT #2: nam=db file sequential read ela= 7235 p1=1 p2=14168 p3=1 FETCH #2:c=10000,e=13869,p=2,cr=3,cu=0,mis=0,r=1,dep=1,og=4,tim=677730717849 STAT #1 id=1 cnt=0 pid=0 pos=1 obj=6251 op=TABLE ACCESS FULL SQLPLUS_PRODUCT_PROFILE (cr=3 r=0 w=0 time=121 us) STAT #2 id=1 cnt=1 pid=0 pos=1 obj=18 op=TABLE ACCESS BY INDEX ROWID OBJ#(18) (cr=3 r=2 w=0 time=13829 us) STAT #2 id=2 cnt=1 pid=1 pos=1 obj=36 op=INDEX UNIQUE SCAN OBJ#(36) (cr=2 r=1 w=0 time=6350 us) WAIT #1: nam=SQL*Net message to client ela= 3 p1=1650815232 p2=1 p3=0 WAIT #1: nam=SQL*Net message from client ela= 256 p1=1650815232 p2=1 p3=0oracle_analysis_101 (12/9/08) Glenn.Fawcett@sun.com Page 47
  48. 48. Response Time Profiling Trace Data • Collect *.trc file as previously shown via oradebug or ??? • Analyze files with > HOTSOS / Method-R profiler > “orasrp” freeware which gives a similar profile to HOTSOS.oracle_analysis_101 (12/9/08) Glenn.Fawcett@sun.com Page 48
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×