• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Oracle diagnostics 11g
 

Oracle diagnostics 11g

on

  • 655 views

Oracle Database Diagnostics. Presentation file from November 2010

Oracle Database Diagnostics. Presentation file from November 2010

Statistics

Views

Total Views
655
Views on SlideShare
653
Embed Views
2

Actions

Likes
0
Downloads
25
Comments
0

2 Embeds 2

http://www.linkedin.com 1
http://plus.url.google.com 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Oracle diagnostics 11g Oracle diagnostics 11g Presentation Transcript

    • Oracle Diagnostics Hemant K Chitale
    • Hemant K Chitale • whoami ? • Oracle 5 to Oracle 10gR2 : DOS, Xenix,8 flavours of Unix, Linux, Windows • Financial Services, Govt/Not-for-Profit, ERP, Custom • Production Support, Consulting, Development • A DBA, not a Developer • Product Specialist, Standard Chartered Bank • My Oracle Blog http://hemantoracledba.blogspot.com
    • Outline Running Sessions Tracing Other Debugging • no “hands-on”
    • Licensing • The OTN License : • • The OTN Standard Licence We grant you a nonexclusive, nontransferable limited license to use the programs only for the purpose of developing, testing, prototyping and demonstrating your application, and not for any other purpose. You may not: - use the programs for your own internal data processing or for any commercial or production purposes, or use the programs for any purpose except the development of your application; - use the application you develop with the programs for any internal data processing or commercial or production purposes without securing an appropriate license from us; • The Diagnostic Pack of Oracle Enterprise Manager : Oracle Diagnostics Pack
    • Diagnostics for Running Sessions • • • • Long Running SQL Statements Latches and Enqueues Locks and LockTrees “Runaway” Processes
    • Long Running SQLs- 1 • How do you use LAST_CALL_ET ? A simple query might be : select s.sid, s.serial#, s.program, s.machine, s.last_call_et, p.spid from v$session s, v$process p where s.paddr=p.addr and s.last_call_et > 30 -- session active more than 30seconds and s.status = ‘ACTIVE’ and s.type != 'BACKGROUND' and s.program not like 'oracle@%P0%' order by s.last_call_et desc, s.sid ; Caveat : LAST_CALL_ET is reset at each *call* from a client.
    • Example 1 : (when LAST_CALL_ET cannot flag a long running query) I ran a query : 09:20:28 SQL> l 1* select * from my_large_table 09:20:29 SQL> / which returned 1149240 rows selected. Elapsed: 00:42:16.68 10:02:55 SQL> in about 42 minutes. However, monitoring the session, using LAST_CALL_ET and STATUS=„ACTIVE‟ didn‟t flag it as a long running SQL :
    • 09:20:43 SQL> l 1 select status, sql_id, last_call_et, event, seq#, state, seconds_in_wait, wait_time_micro 2 from v$session 3* where username = 'HEMANT' 09:20:44 SQL> / STATUS SQL_ID LAST_CALL_ET EVENT SEQ# STATE -------- ------------- ------------ --------------------------------------------------------------- ---------- ------------------SECONDS_IN_WAIT WAIT_TIME_MICRO --------------- --------------INACTIVE 6b80y82aqw9vm 0 SQL*Net message from client 243 WAITING 0 19555 09:20:44 SQL> 09:26:16 SQL> / STATUS SQL_ID LAST_CALL_ET EVENT SEQ# STATE -------- ------------- ------------ --------------------------------------------------------------- ---------- ------------------SECONDS_IN_WAIT WAIT_TIME_MICRO --------------- --------------INACTIVE 6b80y82aqw9vm 0 SQL*Net message from client 12096 WAITING 0 18256 09:26:17 SQL>
    • Only *after* the query had ended, did I see LAST_CALL _ET incrementing : 10:02:03 SQL> / STATUS SQL_ID LAST_CALL_ET EVENT SEQ# STATE -------- ------------- ------------ --------------------------------------------------------------- ---------- ------------------SECONDS_IN_WAIT WAIT_TIME_MICRO --------------- --------------INACTIVE 6b80y82aqw9vm 0 SQL*Net message from client 25516 WAITING 0 13248 10:02:03 SQL> 10:03:24 SQL> / STATUS SQL_ID LAST_CALL_ET EVENT SEQ# STATE -------- ------------- ------------ --------------------------------------------------------------- ---------- ------------------SECONDS_IN_WAIT WAIT_TIME_MICRO --------------- --------------INACTIVE 31 SQL*Net message from client 27469 WAITING 30 29805792 10:03:25 SQL>
    • 10:03:33 SQL> / STATUS SQL_ID LAST_CALL_ET EVENT SEQ# STATE -------- ------------- ------------ --------------------------------------------------------------- ---------- ------------------SECONDS_IN_WAIT WAIT_TIME_MICRO --------------- --------------INACTIVE 39 SQL*Net message from client 27469 WAITING 38 37820708 10:03:33 SQL> / STATUS SQL_ID LAST_CALL_ET EVENT SEQ# STATE -------- ------------- ------------ --------------------------------------------------------------- ---------- ------------------SECONDS_IN_WAIT WAIT_TIME_MICRO --------------- --------------INACTIVE 49 SQL*Net message from client 27469 WAITING 48 47980099 10:03:43 SQL>
    • Thus, LAST_CALL_ET was now showing 38 seconds of true inactive time. Notice that SEQ# is not incrementing now, new waits aren‟t arising. What had been happening was that the session had been rapidly transiting from “ACTIVE” to “INACTIVE” with new waits (SEQ# being incremented) for “SQL*Net message from client”
    • Here is an extract from the trace file : PARSING IN CURSOR #3 len=28 dep=0 uid=184 oct=3 lid=184 tim=1286557710299640 hv=2507024243 ad='32643628' sqlid='6b80y82aqw9vm' select * from my_large_table END OF STMT PARSE #3:c=63990,e=67645,p=213,cr=125,cu=0,mis=1,r=0,dep=0,og=1,plh=1177583212,tim=1286557710299634 EXEC #3:c=0,e=38,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=1177583212,tim=1286557710299770 WAIT #3: nam='SQL*Net message to client' ela= 12 driver id=1650815232 #bytes=1 p3=0 obj#=85340 tim=1286557710299985 WAIT #3: nam='direct path read' ela= 477 file number=4 first dba=17715 block cnt=13 obj#=85340 tim=1286557710301366 WAIT #3: nam='direct path read' ela= 728 file number=4 first dba=17729 block cnt=15 obj#=85340 tim=1286557710303388 FETCH #3:c=3999,e=3709,p=28,cr=4,cu=0,mis=0,r=1,dep=0,og=1,plh=1177583212,tim=1286557710303755 WAIT #3: nam='SQL*Net message from client' ela= 1418 driver id=1650815232 #bytes=1 p3=0 obj#=85340 tim=1286557710305396 WAIT #3: nam='SQL*Net message to client' ela= 25 driver id=1650815232 #bytes=1 p3=0 obj#=85340 tim=1286557710305640 FETCH #3:c=0,e=206,p=0,cr=1,cu=0,mis=0,r=25,dep=0,og=1,plh=1177583212,tim=1286557710305717 WAIT #3: nam='SQL*Net message from client' ela= 63948 driver id=1650815232 #bytes=1 p3=0 obj#=85340 tim=1286557710369770 WAIT #3: nam='SQL*Net message to client' ela= 5 driver id=1650815232 #bytes=1 p3=0 obj#=85340 tim=1286557710370018 FETCH #3:c=0,e=117,p=0,cr=1,cu=0,mis=0,r=25,dep=0,og=1,plh=1177583212,tim=1286557710370067 WAIT #3: nam='SQL*Net message from client' ela= 51133 driver id=1650815232 #bytes=1 p3=0 obj#=85340 tim=1286557710421261 WAIT #3: nam='SQL*Net message to client' ela= 4 driver id=1650815232 #bytes=1 p3=0 obj#=85340 tim=1286557710421421
    • And this is what the tkprof shows : call count ------- ------ cpu elapsed disk query current rows -------- ---------- ---------- ---------- ---------- ---------- Parse 1 0.00 0.00 0 1 0 0 Execute 1 0.00 0.00 0 0 0 0 45971 7.86 8.08 16385 61722 0 1149240 -------- ---------- ---------- ---------- ---------- ---------- Fetch ------- -----total 45973 7.87 8.09 16385 61723 0 Rows Row Source Operation ------- 1149240 --------------------------------------------------- 1149240 TABLE ACCESS FULL MY_LARGE_TABLE (cr=61722 pr=16385 pw=0 time=12818149 us cost=4502 size=192733146 card=931078) Elapsed times include waiting on following events: Event waited on ---------------------------------------SQL*Net message to client direct path read SQL*Net message from client Times Max. Wait Total Waited Waited ---------- ------------ 45971 0.00 0.44 1035 0.00 0.32 45971 0.60 2442.69
    • Example 2 : Using LAST_CALL_ET to monitor an SQL that runs in the database. I ran this SQL : 07:31:10 SQL> create table another_large_table as select * from my_large_table; Table created. 07:32:11 SQL> This SQL runs entirely in the database.
    • Monitoring shows : 07:30:45 SQL> l 1 select status, sql_id, last_call_et, event, seq#, state, seconds_in_wait, wait_time_micro 2 from v$session 3* where username = 'HEMANT' 07:30:50 SQL> / STATUS SQL_ID LAST_CALL_ET EVENT SEQ# STATE -------- ------------- ------------ --------------------------------------------------------------- ---------- ------------------SECONDS_IN_WAIT WAIT_TIME_MICRO --------------- --------------INACTIVE 45 SQL*Net message from client 67 WAITING 45 45330503 07:30:51 SQL> 07:31:41 SQL> / STATUS SQL_ID LAST_CALL_ET EVENT SEQ# STATE -------- ------------- ------------ --------------------------------------------------------------- ---------- ------------------SECONDS_IN_WAIT WAIT_TIME_MICRO --------------- --------------ACTIVE 4yf5v6kwvy5am 23 buffer busy waits 909 WAITING 0 229769
    • 07:31:42 SQL> / STATUS SQL_ID LAST_CALL_ET EVENT SEQ# STATE -------- ------------- ------------ --------------------------------------------------------------- ---------- ------------------SECONDS_IN_WAIT WAIT_TIME_MICRO --------------- --------------ACTIVE 4yf5v6kwvy5am 40 Data file init write 1381 WAITING 0 49629 07:31:59 SQL> 07:33:20 SQL> 07:33:22 SQL> / STATUS SQL_ID LAST_CALL_ET EVENT SEQ# STATE -------- ------------- ------------ --------------------------------------------------------------- ---------- ------------------SECONDS_IN_WAIT WAIT_TIME_MICRO --------------- --------------INACTIVE 4yf5v6kwvy5am 72 SQL*Net message from client 2136 WAITING 71 71039815 07:33:22 SQL>
    • 07:35:20 SQL> / STATUS SQL_ID LAST_CALL_ET EVENT SEQ# STATE -------- ------------- ------------ --------------------------------------------------------------- ---------- ------------------SECONDS_IN_WAIT WAIT_TIME_MICRO --------------- --------------INACTIVE 4yf5v6kwvy5am 190 SQL*Net message from client 2136 WAITING 190 189650841 07:35:21 SQL>
    • This is from the tkprof : call count ------- -----Parse 1 Execute 1 Fetch 0 ------- -----total 2 cpu elapsed disk query current -------- ---------- ---------- ---------- ---------0.01 0.03 0 1 0 2.34 5.11 16385 16638 19413 0.00 0.00 0 0 0 -------- ---------- ---------- ---------- ---------2.35 5.15 16385 16639 19413 rows ---------0 1149240 0 ---------1149240 Misses in library cache during parse: 1 Optimizer mode: ALL_ROWS Parsing user id: 184 Rows Row Source Operation ------- --------------------------------------------------0 LOAD AS SELECT (cr=17253 pr=16385 pw=16384 time=0 us) 1149240 TABLE ACCESS FULL MY_LARGE_TABLE (cr=16389 pr=16385 pw=0 time=5145273 us cost=4502 size=192733146 card=931078) Elapsed times include waiting on following events: Event waited on Times ---------------------------------------Waited direct path read 1035 direct path write 522 log buffer space 1 log file switch completion 2 SQL*Net message to client 1 SQL*Net message from client 1 Max. Wait ---------0.18 0.01 0.02 0.07 0.00 0.00 Total Waited -----------0.47 0.21 0.02 0.07 0.00 0.00
    • Learnings : From the first Example : 1. The user/client SQLPlus session noted an execution time of close to 42minutes. But the database server noted an execution time of 8.09seconds only. 2. Querying by LAST_CALL_ET would have never flagged this as a long running query. 3. 2442seconds are lost on SQL*Net message waits. The database server process is NOT „ACTIVE‟ and LAST_CALL_ET gets reset to 0 at each of these waits. From the second Example : 4. When the SQL runs on the server, although waits do change while it is running, LAST_CALL_ET does not get reset. This is because all the waits are within the one call. When running PLSQL : 5. LAST_CALL_ET reflects the start time of the “top” procedure.
    • Long Running SQLs- 2 • How do you use V$SSQL_MONITOR ? A simple query might be : select username, sid, sql_id, action, elapsed_time, fetches, buffer_gets from v$sql_monitor where status = ‘EXECUTING’;
    • Long Running SQLs- 3 • How do you use V$SESSION_LONGOPS ? A simple query might be : select sid, opname, target, sofar, totalwork, units, to_char(start_time,'HH24:MI:SS') StartTime, elapsed_seconds,time_remaining, message, username from v$session_longops where sofar != totalwork order by start_time ; Caveat : It is based on *operations* not on the executing SQL. Thus, it get‟s reset at each operation. (The view is populated if the operation takes 6seconds or more)
    • Example 3 : Using V$SESSION_LONGOPS without PQ I ran a query : 08:01:26 SQL> create table a_large_table 08:01:44 2 as select * from my_large_table union all select * from my_large_table 08:02:17 3 union all select * from my_large_table; Table created. 08:03:06 SQL> This is the Execution Plan : PLAN_TABLE_OUTPUT ----------------------------------------------------------------------------------------------------------------------------------Plan hash value: 893555804 ----------------------------------------------------------------------------------------| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ----------------------------------------------------------------------------------------| 0 | CREATE TABLE STATEMENT | | 2793K| 551M| 31487 (67)| 00:06:18 | | 1 | LOAD AS SELECT | A_LARGE_TABLE | | | | | | 2 | UNION-ALL | | | | | | | 3 | TABLE ACCESS FULL | MY_LARGE_TABLE | 931K| 183M| 4502 (1)| 00:00:55 | | 4 | TABLE ACCESS FULL | MY_LARGE_TABLE | 931K| 183M| 4502 (1)| 00:00:55 | | 5 | TABLE ACCESS FULL | MY_LARGE_TABLE | 931K| 183M| 4502 (1)| 00:00:55 | ----------------------------------------------------------------------------------------- Note ----- dynamic sampling used for this statement (level=2)
    • It started appearing in V$SESSION_LONGOPS only after some time : 08:02:02 SQL> l 1 select sid, sql_plan_line_id, sql_plan_operation, opname, target, sofar, totalwork, 2 units, to_char(start_time,'HH24:MI:SS') StartTime, 3 elapsed_seconds, time_remaining, message, username 4 from v$session_longops 5 where sofar != totalwork 6* order by start_time 08:02:41 SQL> / no rows selected 08:02:43 SQL> / no rows selected
    • 08:02:48 SQL> / SID SQL_PLAN_LINE_ID SQL_PLAN_OPERATION OPNAME ---------- ---------------- ------------------------------ --------------------------------------------------------------TARGET SOFAR TOTALWORK UNITS STARTTIM ---------------------------------------------------------------- ---------- ----------------------------------------- -------ELAPSED_SECONDS TIME_REMAINING --------------- -------------MESSAGE ----------------------------------------------------------------------------------------------------------------------------------USERNAME -----------------------------44 4 TABLE ACCESS Table Scan HEMANT.MY_LARGE_TABLE 16054 16556 Blocks 08:02:42 11 0 Table Scan: HEMANT.MY_LARGE_TABLE: 16054 out of 16556 Blocks done HEMANT 08:02:54 SQL> / no rows selected
    • 08:02:57 SQL> / SID SQL_PLAN_LINE_ID SQL_PLAN_OPERATION OPNAME ---------- ---------------- ------------------------------ --------------------------------------------------------------TARGET SOFAR TOTALWORK UNITS STARTTIM ---------------------------------------------------------------- ---------- ----------------------------------------- -------ELAPSED_SECONDS TIME_REMAINING --------------- -------------MESSAGE ----------------------------------------------------------------------------------------------------------------------------------USERNAME -----------------------------44 5 TABLE ACCESS Table Scan HEMANT.MY_LARGE_TABLE 12902 16556 Blocks 08:02:54 7 2 Table Scan: HEMANT.MY_LARGE_TABLE: 12902 out of 16556 Blocks done HEMANT 08:03:03 SQL> / no rows selected 08:03:09 SQL>
    • Example 4 : Using V$SESSION_LONGOPS with PQ I ran this query : 07:47:08 SQL> l 1* select /*+ FULL (a) PARALLEL (a 2) */ count(*) from a_large_table a 07:47:09 SQL> / COUNT(*) ---------27581760 07:48:13 SQL> select blocks from user_segments where segment_name = 'A_LARGE_TABLE'; BLOCKS ---------394112 (approx 3GB)
    • PLAN_TABLE_OUTPUT ----------------------------------------------------------------------------------------------------------------------------------Plan hash value: 3384045684 -----------------------------------------------------------------------------------------------------------| Id | Operation | Name | Rows | Cost (%CPU)| Time | TQ |IN-OUT| PQ Distrib | -----------------------------------------------------------------------------------------------------------| 0 | SELECT STATEMENT | | 1 | 7451 (1)| 00:01:30 | | | | | 1 | SORT AGGREGATE | | 1 | | | | | | | 2 | PX COORDINATOR | | | | | | | | | 3 | PX SEND QC (RANDOM) | :TQ10000 | 1 | | | Q1,00 | P->S | QC (RAND) | | 4 | SORT AGGREGATE | | 1 | | | Q1,00 | PCWP | | | 5 | PX BLOCK ITERATOR | | 3447K| 7451 (1)| 00:01:30 | Q1,00 | PCWC | | | 6 | TABLE ACCESS FULL| A_LARGE_TABLE | 3447K| 7451 (1)| 00:01:30 | Q1,00 | PCWP | | ------------------------------------------------------------------------------------------------------------
    • Querying for LONGOPS : 07:48:04 SQL> / SID SQL_PLAN_LINE_ID SQL_PLAN_OPERATION OPNAME TARGET SOFAR TOTALWORK UNITS STARTTIM ELAPSED_SECONDS TIME_REMAINING ---------- ---------------- ------------------------------ --------------------------------------------------------------- ---------------------------------------------------------------- ------------------- -------------------------------- -------- --------------- -------------MESSAGE ---------------------------------------------------------------------------------------------------USERNAME -----------------------------40 6 TABLE ACCESS Rowid Range Scan HEMANT.A_LARGE_TABLE 14675 15141 Blocks 07:47:57 7 0 Rowid Range Scan: HEMANT.A_LARGE_TABLE: 14675 out of 15141 Blocks done HEMANT 07:48:04 SQL> / no rows selected 07:48:07 SQL> / SID SQL_PLAN_LINE_ID SQL_PLAN_OPERATION OPNAME TARGET SOFAR TOTALWORK UNITS STARTTIM ELAPSED_SECONDS TIME_REMAINING ---------- ---------------- ------------------------------ --------------------------------------------------------------- ---------------------------------------------------------------- ------------------- -------------------------------- -------- --------------- -------------MESSAGE ---------------------------------------------------------------------------------------------------USERNAME -----------------------------40 6 TABLE ACCESS Rowid Range Scan HEMANT.A_LARGE_TABLE 12801 15141 Blocks 07:48:04 8 1 Rowid Range Scan: HEMANT.A_LARGE_TABLE: 12801 out of 15141 Blocks done HEMANT 07:48:13 SQL>
    • Learnings : 1. Each step in the Execution Plan is a separate Operation. If a Full Table Scan appears more than once OR is executed more than once (e.g. inside a Nested Loop), each execution of that step (operation) is a separate entry in V$SESSION_LONGOPS 2. When ParallelQuery is used, each PQ Slave is allocated a certain number of blocks by the QueryCoordinator – e.g. 10,000 blocks. A scan of this range of blocks shows as a “Rowid Range Scan” in V$SESSION_LONGOPS. That is why you would see multiple occurrences of “Full Scan” for a large table FullTableScan using PQ -- as each Slave restarts with a new set of blocks. 3. So, V$SESSION_LONGOPS will not necessarily tell you the expected duration of the SQL statement if there are multiple operations and/OR multiple passes (e.g. Nested Loop OR ParallelQuery) 4. You can‟t monitor an import with V$SESSION_LONGOPS 5. V$SESSION_LONGOPS can hold 500 entries. Inactive entries are not cleared up immediately
    • Long Running SQLs- 4 • How do you use V$ACTIVE_SESSION_HISTORY ? A simple query might be : select * from ( select sample_time, sql_id, sql_plan_line_id, sql_plan_operation ,current_obj#, seq#, event, p1,p2,p3 from v$active_session_history where session_id='&sid' and sample_time > (systimestamp-(10/1440)) order by sample_id desc ) where rownum < 240 order by sample_time asc ; Caveat : If the session is not „ACTIVE‟ and/or is not waiting on an event (e.g. it is doing logical I/O on CPU) you won‟t see any entries.
    • Example 5 : Using V$ACTIVE_SESSION_HISTORY (note : DiagPack Licence reqd !) I run this query : 08:01:39 SQL> select count(*) from a_large_table; COUNT(*) ---------27581760 08:03:11 SQL>
    • 10-OCT-10 08.03.04.213 AM 47291 direct path read 94duwhgx0jh10 10-OCT-10 08.03.05.213 AM 47545 direct path read 94duwhgx0jh10 10-OCT-10 08.03.06.223 AM 47839 direct path read 94duwhgx0jh10 10-OCT-10 08.03.07.223 AM 48053 direct path read 94duwhgx0jh10 10-OCT-10 08.03.08.223 AM 48260 direct path read 94duwhgx0jh10 10-OCT-10 08.03.09.223 AM 48617 direct path read 94duwhgx0jh10 10-OCT-10 08.03.10.233 AM 48990 direct path read 94duwhgx0jh10 4 4 4 4 4 4 4 2 TABLE ACCESS 405072 16 2 TABLE ACCESS 409136 85347 16 2 TABLE ACCESS 432256 85347 16 2 TABLE ACCESS 426288 85347 16 2 TABLE ACCESS 420576 85347 16 2 TABLE ACCESS 417264 85347 16 2 TABLE ACCESS 413840 85347 16 85 rows selected. 08:04:31 SQL> l 1 select * from ( 2 select sample_time, sql_id, sql_plan_line_id, sql_plan_operation, current_obj#, seq#, event, p1,p2,p3 from v$active_session_history where session_id='&sid' 3 and sample_time > (systimestamp-(10/1440)) order by sample_id desc ) 4 where rownum < 240 5* order by sample_time asc 08:04:54 SQL> 85347
    • Another Example : A query with GROUP BY, ORDER BY 03:42:36 SQL> select country, store_type, count(*) 03:42:46 2 from store_list 03:42:48 3 group by country, store_type 03:42:52 4 order by country, store_type 03:42:56 5 / 343 rows selected. 03:43:53 SQL>
    • Querying V$ACTIVE_SESSION_HISTORY : SAMPLE_TIME SQL_ID SQL_PLAN_LINE_ID SQL_PLAN_OPERATION CURRENT_OBJ# SEQ# ---------------------------- ------------- ---------------- ------------------------------ --------------------EVENT P1 P2 P3 ------------------------------ ---------- ---------- ---------11-OCT-10 03.43.08.660 AM dnhs0zc9ub960 2 TABLE ACCESS 85347 8623 direct path read 4 42704 16 11-OCT-10 03.43.09.660 AM 9180 direct path read dnhs0zc9ub960 11-OCT-10 03.43.10.660 AM 9443 dnhs0zc9ub960 4 4 11-OCT-10 03.43.11.660 AM 9851 direct path read 61456 dnhs0zc9ub960 4 4 4 . . . 85347 16 2 TABLE ACCESS 79856 85347 16 2 TABLE ACCESS 73744 85347 16 2 TABLE ACCESS 67984 85347 16 1 SORT dnhs0zc9ub960 11-OCT-10 03.43.13.670 AM 10593 51616 dnhs0zc9ub960 11-OCT-10 03.43.12.670 AM 10211 direct path read 2 TABLE ACCESS 16 85347
    • 11-OCT-10 03.43.17.680 AM 12548 dnhs0zc9ub960 4 11-OCT-10 03.43.18.680 AM 13118 111136 dnhs0zc9ub960 4 11-OCT-10 03.43.19.680 AM 13579 1 SORT dnhs0zc9ub960 4 11-OCT-10 03.43.20.680 AM 14100 direct path read 11-OCT-10 03.43.21.690 AM 14422 direct path read dnhs0zc9ub960 11-OCT-10 03.43.22.690 AM 14884 127632 dnhs0zc9ub960 4 4 85347 16 1 SORT 148512 85347 16 2 TABLE ACCESS 141120 85347 16 2 TABLE ACCESS 135968 85347 16 2 TABLE ACCESS dnhs0zc9ub960 4 16 2 TABLE ACCESS 120256 85347 85347 16 . . . 11-OCT-10 03.43.51.770 AM 31661 dnhs0zc9ub960 4 11-OCT-10 03.43.52.780 AM 32284 direct path read 03:44:23 SQL> 1 SORT 416944 dnhs0zc9ub960 4 16 2 TABLE ACCESS 426912 85347 16 85347
    • Notice that the samples are once every second – thus, NOT *every* Wait is reported. SEQ# increases significantly within each second.
    • Learnings : 1. A snapshot is obtained every second. Thus, NOT *every* Wait is reported. SEQ# increases significantly within each second. 2. The view does *NOT* get reset at each new SQL. 3. If you query only by SID, you might see data for two different sessions – one that had logged out and the second that was a new session, reusing the same SID. So, always query by SESSION_ID, SESSION_SERIAL# together.
    • Long Running SQLs- 5 • How do you identify “expensive” SQLs from V$SQLSTATS ? An example query might be : select * from (select cpu_time, elapsed_time, application_wait_time, concurrency_wait_time, user_io_wait_time, disk_reads, buffer_gets, rows_processed, executions, px_servers_executions, last_active_time, sql_id, sql_text from v$sqlstats order by 2 desc ) where rownum < 11 ; -- change the ORDER BY clause as appropriate
    • Querying for SQLs with largest Elapsed_Time : CPU_TIME ELAPSED_TIME APPLICATION_WAIT_TIME CONCURRENCY_WAIT_TIME USER_IO_WAIT_TIME DISK_READS BUFFER_GETS ROWS_PROCESSED ---------- ------------ --------------------- --------------------- ---------------- ---------- ----------- -------------EXECUTIONS PX_SERVERS_EXECUTIONS LAST_ACTI SQL_ID ---------- --------------------- --------- ------------SQL_TEXT ----------------------------------------------------------------------------------------------------------------------------------25744086 69716822 0 0 65719693 786452 786469 2 2 0 11-OCT-10 5j3srpqk9ut8p select count(*) from STORE_LIST x 30791318 45650857 0 0 29723427 393228 393260 343 1 0 11-OCT-10 dnhs0zc9ub960 select country, store_type, count(*) from store_list group by country, store_type order by country, store_type 9488558 29879924 0 28339308 302096 302075 0 1 0 11-OCT-10 20dyx0fhnrcv2 select count(*) from store_list 0
    • Long Running SQLs- 6 • How do you use V$TRANSACTION to monitor DML ? An example query might be : select s.sid, s.serial#, p.spid, s.username, s.program, t.xidusn, t.used_ublk, t.used_urec, t.start_time, s.last_call_et from v$process p,v$session s, v$transaction t where s.paddr=p.addr and s.taddr=t.addr order by s.sid ;
    • Example : A large transaction : 04:19:04 SQL> update store_list set store_type = 'DEPT-X' where store_type = 'DEPT-L'; 396360 rows updated. 04:24:41 SQL> 04:28:58 SQL> commit; Commit complete. 04:29:00 SQL>
    • 04:20:00 SQL> / SID SERIAL# SPID USERNAME PROGRAM ---------- ---------- ------------------------ ------------------------------ ----------------------------------------------XIDUSN USED_UBLK USED_UREC START_TIME LAST_CALL_ET ---------- ---------- ---------- -------------------- -----------46 114 4402 HEMANT sqlplus@localhost.localdomain (TNS V1-V3) 4 124 14313 10/11/10 04:19:25 36 04:20:01 SQL> / SID SERIAL# SPID USERNAME PROGRAM ---------- ---------- ------------------------ ------------------------------ ----------------------------------------------XIDUSN USED_UBLK USED_UREC START_TIME LAST_CALL_ET ---------- ---------- ---------- -------------------- -----------46 114 4402 HEMANT sqlplus@localhost.localdomain (TNS V1-V3) 4 382 44499 10/11/10 04:19:25 48 04:20:13 SQL> 04:22:01 SQL> / SID SERIAL# SPID USERNAME PROGRAM ---------- ---------- ------------------------ ------------------------------ ----------------------------------------------XIDUSN USED_UBLK USED_UREC START_TIME LAST_CALL_ET ---------- ---------- ---------- -------------------- -----------46 114 4402 HEMANT sqlplus@localhost.localdomain (TNS V1-V3) 4 3034 355570 10/11/10 04:19:25 156 04:22:01 SQL>
    • 04:24:43 SQL> / SID SERIAL# SPID USERNAME PROGRAM ---------- ---------- ------------------------ ------------------------------ ----------------------------------------------XIDUSN USED_UBLK USED_UREC START_TIME LAST_CALL_ET ---------- ---------- ---------- -------------------- -----------46 114 4402 HEMANT sqlplus@localhost.localdomain (TNS V1-V3) 4 5918 442000 10/11/10 04:19:25 2 04:24:44 SQL> 04:28:59 SQL> / no rows selected 04:29:01 SQL>
    • Learnings : 1. USED_UREC doesn‟t actually map to the number of Table Rows (or Table Rows + (Table Rows * No. of Indexes updated)). An Undo Record is not a 1-to-1 match to Table + Index records. However, you can still use it to extrapolate or compare transaction sizes if the table and index definitions are similar. 2. USED_UREC is useful to monitor *rollback*. If a transaction has failed internally and is automatically being rolled back, you would see USED_UREC declining much before you get a transaction failure error. 3. Direct Path Operations (e.g. INSERT /*+ APPEND */) would reflect as only 1 Undo Record – the table has to be locked for the duration of the INSERT.
    • Latches and Enqueues • Latches are points of concurrency ; Enqueues are points of serialisation. • Three sessions may be waiting on a Latch – any of the three may get the Latch before the other two. • If sessions are waiting on an Enqueue, access is serialised – the first waiter gets the Enqueue first.
    • Latches • Identify Latches with V$LATCHNAME, V$LATCH, V$LATCH_PARENT, V$LATCH_CHILDREN • Latch Holders are in V$LATCHHOLDER • “willing-to-wait” and “no-wait” latches • “gets”, “misses” and “sleeps” – 1,2,3… • Most commonly known : – library cache ; shared pool – cache buffers chains
    • Identifying Latches (listing from 11.2) : SQL> select count(*) from v$latchname; COUNT(*) ---------535 SQL> select count(*) from v$latch; COUNT(*) ---------535 SQL> select count(*) from v$latch_parent; COUNT(*) ---------535 SQL> select count(*) from v$latch_children; COUNT(*) ---------2773 SQL> SQL> desc v$latchholder Name Null? ----------------------------------------- -------PID SID LADDR NAME GETS SQL> Type ---------------------------NUMBER NUMBER RAW(4) VARCHAR2(64) NUMBER
    • Refer to Oracle Support Document ID “What are Latches and What Causes Latch Contention [ID 22908.1]” for sample queries for latches /* ** Display System-wide latch statistics. */ /* ** Given a latch address, find out the latch name. */ /* ** Display latch statistics by latch name. */
    • Library Cache Latch usage : This latch protects SQL statements, object definitions etc. Oracle internally determines the number of latches available (as a prime number). Adding new SQL statements and Objects require the Latch. One latch would be protecting multiple SQLs. Note : If you have DDLs modifying object definitions, there would be waits on the Library Cache Latches protecting those objects. Library Cache Pin Latch usage : When a statement is re-executed (it has to be pinned to ensure that it is not modified ! Shared Pool Latch : This latch protects the allocation of memory in the Shared Pool. Multiple child latches are created. Frequent Hard Parsing of SQLs would cause frequent access to the Shared Pool Latch.
    • Cache Buffers Chains Latch : Protects Memory Buffers for Database Blocks. Multiple (Child) Latches are present, each Latch protecting multiple buffers (blocks). Blocks are loaded into memory based on a hash of the Database Block Address. A Linked List of the Buffer Headers is maintained so that a Block can be found quickly in the Buffer Cache. Look for CBC Latches with very high SLEEPs – indicating very frequent retries. Don‟t run this query – it will take a long time on a busy instance / large database. -- query from How To Identify a Hot Block Within The Database Buffer Cache. [ID 163424.1] … query text deleted … Typically Hot Blocks can be Index Root / Branch Blocks when an Index is frequently used in a Nested Loop. So, look at the Sessions and SQLs and Execution Plans.
    • Cloned Buffers : Buffers are “cloned” when different sessions require different versions for Read Consistency. Assume User “A” started a query at time t0 Assume User “C” modified Buffer 123 (representing a specific table block) at time t5 with an UPDATE statement If User “A”s session comes to the the same table block, the DBA (DataBlockAddress) requires it read Buffer 123. It finds, from the Buffer Header SCN, that the Block has been modified. The ITL entry identifies the Undo Segment and slot. From the Undo information, the session now as to recreate the “pre-change” image of the block. So, it “clones” the Buffer as another Buffer in memory and applies the Undo information to it – because it actually has to modify the block, which it cannot [and should not] do against the “dirty” Buffer 123 last updated by User “C”. The same buffer can have multiple clones. Also, a session might have to keep “rolling back” a block through multiple updates to get to it‟s desired state (SCN). So : A Buffer Cache of 800MB doesn‟t necessarily mean that you have 800MB of data, some of it could be multiple copies of the same database block, as of different points in time.
    • Cache Buffers LRU Chain Latch : The LRU Chain is a list of buffers. Oracle maintains multiple lists. A process that needs to load a block into memory “walks” the chain to identify a buffer that can be “used” (e.g. an empty or clean buffer). If it cannot find a buffer, it marks a list of dirty buffers for DBWR to flush to disk. When the buffers are flushed to disk, they are marked “clean” and can be reused. This latch is required whenever changes are to be made. Waits on these would mean that DBWR isn‟t fast enough.
    • Learnings : 1. Frequent Hard Parses strain the Shared Pool and Library Cache Latches. 2. Hot Blocks cause waits on particular Cache Buffer Chains Latches. The Hot Blocks need to be identified based the SQLs of the Sessions waiting on CBC Latches. Fixes could be SQL tuning, rebuilding table/index, reverse key indexes(avoid them !!). 3. Move to faster I/O, use Async I/O , add DB_WRITER_PROCESSES only as a last resort. 4. Latch Issues point to Concurrency issues.
    • Enqueues • Enqueues are Queuing Mechanisms • The most famous ones are Row Locks (“TX”) and DML Locks (“TM”) • There are 64 different types of Enqueues (11.2) [62 in 10.2] See Appendix D of the Reference Guide
    • The Controlfile (CF) Enqueue is taken when LGWR or ARCH is updating the Controlfile or when CKPT is updating checkpoint information. NOLOGGING operations also take CF enqueues ! I‟ve seen database instances crash when the CF enqueue is held for too long by one background process. RMAN uses a snapshot controlfile to avoid CF enqeuues. The Undo Segment (US) Enqueue is taken when adding undo segments, taking them online or offline. When a “storm” of activity occurs, you may find US enqueue waits. The Space Transaction (ST) Enqueue is for allocation / deallocation of extents. The Object Chekpoint (KO) Enqueue is for Oracle to checkpoint an object/segment – e.g. for a TRUNCATE or DROP or before Parallel Query The Sequence Number (SQ) Enqueue is for incrementing Sequences. Setting appropriate CACHE sizes is important. The Job Queue (JQ) Enqueue is for Jobs. Cross Instance (CI) Enqueue doesn‟t appear only in RAC ! You will see requests and waits in non-RAC as well.
    • Enqueue Waits in an *idle* instance : SQL> desc v$enqueue_stat Name Null? ----------------------------- -------INST_ID EQ_TYPE TOTAL_REQ# TOTAL_WAIT# SUCC_REQ# FAILED_REQ# CUM_WAIT_TIME SQL> SQL> select eq_type, total_req#, total_wait# from v$enqueue_stat where total_wait# > 0 order by 1; EQ TOTAL_REQ# TOTAL_WAIT# -- ---------- ----------CF 2895 1 JS 64043 15 KO 18 1 PR 358 3 PV 53 3 TH 361 1 6 rows selected. SQL> Type -------------------NUMBER VARCHAR2(2) NUMBER NUMBER NUMBER NUMBER NUMBER ------- controlfile not documented multiple object checkpoint process startup not documented not documented
    • After running : SQL> create table abc as select * from dba_objects 2 union all select * from dba_objects 3 union all select * from dba_objects; Table created. EQ TOTAL_REQ# TOTAL_WAIT# -- ---------- ----------KO 27 2 On running : SQL> select /*+ PARALLEL (a 4) */ count(*) from abc a; COUNT(*) ---------229848 SQL> EQ TOTAL_REQ# TOTAL_WAIT# -- ---------- ----------KO 36 3
    • Learnings : 1. There are many different points of serialisation other than Row Locks. 2. Watch out for critical enqueues – CF, TS, SQ, KO.
    • Locks and Lock Trees • Row Locks are Enqueues • They serialise access to rows • A transaction may hold Row Locks on multiple rows – this is represented as a single entry in V$TRANSACTION but single or multiple entries in the ITL slots in various table / index blocks • ITLs allow different transactions to lock different rows in the same block concurrently. • Lock Trees are multiple sessions waiting “in order”, with potentially more than one session waiting on the same row lock
    • A Lock Tree : Script “uttllockt.sql” can provide a tree-like diagram. * * * * * * * * * * * * * * * * * * * This script prints the sessions in the system that are waiting for locks, and the locks that they are waiting for. The printout is tree structured. If a sessionid is printed immediately below and to the right of another session, then it is waiting for that session. The session ids printed at the left hand side of the page are the ones that everyone is waiting for. For example, in the following printout session 9 is waiting for session 8, 7 is waiting for 9, and 10 is waiting for 9. WAITING_SESSION ----------------8 9 7 10 TYPE ---NONE TX RW RW MODE REQUESTED ----------------None Share (S) Exclusive (X) Exclusive (X) MODE HELD ----------------None Exclusive (X) S/Row-X (SSX) S/Row-X (SSX) LOCK ID1 LOCK ID2 -------- -------0 0 65547 16 33554440 2 33554440 2 The lock information to the right of the session id describes the lock that the session is waiting for (not the lock it is holding). The script can be enhanced to provide more session information. The script uses DDLs to drop and create temp tables – so another enhancement would be to have those tables created in advance as GTTs and only populated and queried by the script
    • select s.blocking_session, to_number(s.sid) Waiting_Session, s.event, s.seconds_in_wait, p.pid, p.spid "ServerPID", s.process "ClientPID", s.username, s.program, s.machine, s.osuser, s.sql_id, substr(sq.sql_text,1,75) SQL from v$sql sq, v$session s, v$process p where s.event like 'enq: TX%' and s.paddr=p.addr and s.sql_address=sq.address and s.sql_hash_value=sq.hash_value and s.sql_id=sq.sql_id and s.sql_child_number=sq.child_number union all select s.blocking_session, to_number(s.sid) Waiting_Session, s.event, s.seconds_in_wait, p.pid, p.spid "ServerPID", s.process "ClientPID", s.username, s.program, s.machine, s.osuser, s.sql_id, substr(sq.sql_text,1,75) SQL from v$sql sq, v$session s, v$process p where s.sid in (select distinct blocking_session from v$session where event like 'enq: TX%') and s.paddr=p.addr and s.sql_address=sq.address(+) and s.sql_hash_value=sq.hash_value(+) and s.sql_id=sq.sql_id(+) and s.sql_child_number=sq.child_number(+) order by 1 nulls first, 2 /
    • Two separate sessions : SQL> connect / as sysdba Connected. SQL> update hemant.test_row_lock set content = 'Another' where pk=1; 1 row updated. SQL> SQL> connect hemant/hemant Connected. SQL> update test_row_lock set content = 'First' where pk=1; ….. now waiting …..
    • BLOCKING_SESSION WAITING_SESSION EVENT SECONDS_IN_WAIT PID ---------------- --------------- --------------------------------------------------------------- --------------- ---------ServerPID ClientPID USERNAME PROGRAM ------------------------ ------------------------ ------------------------------ ----------------------------------------------MACHINE OSUSER SQL_ID ---------------------------------------------------------------- ----------------------------- ------------SQL ----------------------------------------------------------------------------------------------------------------------------------17 SQL*Net message from client 82 27 13788 13770 SYS sqlplus@localhost.localdomain (TNS V1-V3) localhost.localdomain oracle 17 26 enq: TX - row lock contention 52 19 13791 3449 HEMANT sqlplus@localhost.localdomain (TNS V1-V3) localhost.localdomain fuwn3bnuh2axg update test_row_lock set content = 'First' where pk=1 SQL> oracle
    • “Runaway” Processes • A “Runaway” is a process continuously taking CPU on the server, with no corresponding Client process, because the client was abruptly terminated • This is where you can use queries by LAST_CALL_ET – because there is no client that is sending calls, the LAST_CALL_ET keeps increasing • But remember : You MUST check to see if there is a client (or Application Server) that has submitted the SQL and *is* legitimately waiting for the results.