Tuning & Tracing Parallel Execution (An Introduction) Doug Burns (dougburns@yahoo.com)
Introduction <ul><li>Introduction </li></ul><ul><li>Parallel Architecture </li></ul><ul><li>Configuration </li></ul><ul><l...
<ul><li>Parallel Query Option introduced in 7.1 </li></ul><ul><ul><li>Now called Parallel Execution </li></ul></ul><ul><li...
<ul><li>A little history </li></ul><ul><li>So why did so few sites implement PQO? </li></ul>Introduction <ul><ul><li>- Lac...
<ul><li>Non-Parallel Architecture? </li></ul>Introduction
Parallel Architecture <ul><li>Introduction </li></ul><ul><li>Parallel Architecture </li></ul><ul><li>Configuration </li></...
Parallel Architecture <ul><li>Non-Parallel </li></ul>Parallel Deg 2
<ul><li>The Degree of Parallelism (DOP) refers to the number of discrete threads of work </li></ul><ul><li>The  default  D...
Parallel Architecture <ul><li>Inter-process communication is through message buffers (also known as table queues) </li></u...
Parallel Architecture   <ul><li>This slide intentionally left blank </li></ul>
<ul><li>Methods of invoking Parallel Execution </li></ul><ul><ul><li>Table / Index Level </li></ul></ul><ul><ul><ul><ul><l...
Configuration <ul><li>Introduction </li></ul><ul><li>Parallel Architecture </li></ul><ul><li>Configuration </li></ul><ul><...
<ul><li>parallel_automatic_tuning </li></ul><ul><ul><li>First introduced in Oracle 8i </li></ul></ul><ul><ul><li>This is t...
<ul><li>parallel_adaptive_multi_user </li></ul><ul><ul><li>First introduced in Oracle 8 </li></ul></ul><ul><ul><li>Default...
<ul><li>parallel_max_servers </li></ul><ul><ul><li>Default - cpu_count * parallel_threads_per_cpu * 2 (if using automatic ...
<ul><li>parallel_execution_message_size </li></ul><ul><ul><li>Default Value – 2148 bytes </li></ul></ul><ul><ul><li>Automa...
<ul><li>Metalink Note 201799.1 contains full details and guidance for setting all parameters </li></ul><ul><li>Ensure that...
Dictionary Views <ul><li>Introduction </li></ul><ul><li>Parallel Architecture </li></ul><ul><li>Configuration </li></ul><u...
<ul><li>Parallel-specific Dictionary Views </li></ul><ul><ul><li>SELECT table_name  </li></ul></ul><ul><ul><li>FROM dict  ...
<ul><li>v$pq_sesstat </li></ul><ul><ul><li>Provides statistics relating to the current session </li></ul></ul><ul><ul><li>...
<ul><li>v$pq_sysstat </li></ul><ul><ul><li>The instance-level overview </li></ul></ul><ul><ul><li>Various values, includin...
<ul><li>v$pq_slave </li></ul><ul><ul><li>Gives information on the activity of individual PX slaves </li></ul></ul><ul><ul>...
<ul><li>v$pq_tqstat </li></ul><ul><ul><li>Shows communication relationship between slaves </li></ul></ul><ul><ul><li>Must ...
<ul><li>Example 2 - with a sort operation </li></ul><ul><ul><li>SELECT /*+ PARALLEL (attendance, 4) */ * </li></ul></ul><u...
<ul><li>So why the unbalanced slaves? </li></ul><ul><ul><li>Check the list of distinct values in amount_paid </li></ul></u...
<ul><li>v$px_session and v$px_sesstat </li></ul><ul><ul><li>Query to show slaves and physical reads </li></ul></ul><ul><ul...
<ul><li>v$px_process </li></ul><ul><ul><li>Shows parallel execution slave processes, status and session information </li><...
<ul><li>Monitoring the SQL being executed by slaves </li></ul><ul><ul><li>set pages 0 </li></ul></ul><ul><ul><li>column sq...
<ul><li>Additional information in standard Dictionary Views </li></ul><ul><ul><li>e.g. v$sysstat </li></ul></ul><ul><ul><l...
<ul><li>Monitoring the adaptive multi-user algorithm </li></ul><ul><ul><li>We need to be able to check whether operations ...
<ul><li>Statspack </li></ul><ul><ul><li>Example Report (Excerpt) </li></ul></ul><ul><ul><li>During overnight batch operati...
Tracing and Wait Events <ul><li>Introduction </li></ul><ul><li>Parallel Architecture </li></ul><ul><li>Configuration </li>...
<ul><li>Tracing Parallel Execution operations is more complicated than standard tracing </li></ul><ul><ul><li>One trace fi...
<ul><li>Much simpler in 10g </li></ul><ul><ul><li>Use trcsess to generate a consolidated trace file for QC and all slaves ...
<ul><li>This is what one of the slaves looks like </li></ul><ul><ul><li>C:oracleproduct10.2.0adminORCLudump>cd  ../bdump <...
<ul><li>Many more wait events and more time spent waiting </li></ul><ul><ul><li>The various processes need to communicate ...
<ul><li>Events indicating consumers or QC are waiting for data from producers </li></ul><ul><ul><li>PX Deq: Execute Reply ...
<ul><li>Events indicating producers are quicker than consumers (or QC) </li></ul><ul><ul><li>PX qref latch </li></ul></ul>...
<ul><li>Messaging Events </li></ul><ul><ul><li>PX Deq Credit: need buffer </li></ul></ul><ul><ul><li>PX Deq Credit: send b...
<ul><li>Query Coordinator waiting for the slaves to parse their SQL statements </li></ul><ul><ul><li>PX Deq: Parse Reply <...
<ul><li>Partial Message Event </li></ul><ul><ul><li>PX Deq: Msg Fragment </li></ul></ul><ul><li>May be eliminated or impro...
<ul><li>Example </li></ul><ul><ul><li>Excerpt from an overnight Statspack Report  </li></ul></ul><ul><li>                 ...
<ul><li>                                                      </li></ul><ul><ul><li>Event              Waits   Timeouts   ...
Conclusion <ul><li>Introduction </li></ul><ul><li>Parallel Architecture </li></ul><ul><li>Configuration </li></ul><ul><li>...
<ul><li>Plan / Test / Implement </li></ul><ul><ul><li>Asking for trouble if you don’t! </li></ul></ul><ul><li>Hardware </l...
<ul><li>Don’t use it for small, fast tasks </li></ul><ul><ul><li>They won’t go much quicker </li></ul></ul><ul><ul><li>The...
<ul><li>The slower your I/O sub-system, the more benefit you are likely to see from PX </li></ul><ul><ul><li>But shouldn’t...
Tuning & Tracing Parallel Execution (An Introduction) Doug Burns (dougburns@yahoo.com) (oracledoug.blogspot.com) (doug.bur...
Upcoming SlideShare
Loading in …5
×

Introduction to Parallel Execution

11,128 views

Published on

An overview presentation covering the use of Oracle's PX functionality including some tips and traps. Detailed white paper at http://oracledoug.com/px.html

Published in: Technology, Business

Introduction to Parallel Execution

  1. 1. Tuning & Tracing Parallel Execution (An Introduction) Doug Burns (dougburns@yahoo.com)
  2. 2. Introduction <ul><li>Introduction </li></ul><ul><li>Parallel Architecture </li></ul><ul><li>Configuration </li></ul><ul><li>Dictionary Views </li></ul><ul><li>Tracing and Wait Events </li></ul><ul><li>Conclusion </li></ul>
  3. 3. <ul><li>Parallel Query Option introduced in 7.1 </li></ul><ul><ul><li>Now called Parallel Execution </li></ul></ul><ul><li>Parallel Execution splits a single large task into multiple smaller tasks which are handled by separate processes running concurrently. </li></ul><ul><ul><li>Full Table Scans </li></ul></ul><ul><ul><li>Partition Scans </li></ul></ul><ul><ul><li>Sorts </li></ul></ul><ul><ul><li>Index Creation </li></ul></ul><ul><ul><li>And others … </li></ul></ul>Introduction
  4. 4. <ul><li>A little history </li></ul><ul><li>So why did so few sites implement PQO? </li></ul>Introduction <ul><ul><li>- Lack of understanding </li></ul></ul><ul><ul><li>- Leads to horrible early experiences </li></ul></ul><ul><ul><li>- Community's resistance to change </li></ul></ul><ul><ul><li>- Not useful in all environments </li></ul></ul><ul><ul><li>- Needs time and effort applied to the initial design! </li></ul></ul><ul><li>Isn’t Oracle’s Instance architecture parallel anyway? </li></ul>
  5. 5. <ul><li>Non-Parallel Architecture? </li></ul>Introduction
  6. 6. Parallel Architecture <ul><li>Introduction </li></ul><ul><li>Parallel Architecture </li></ul><ul><li>Configuration </li></ul><ul><li>Dictionary Views </li></ul><ul><li>Tracing and Wait Events </li></ul><ul><li>Conclusion </li></ul>
  7. 7. Parallel Architecture <ul><li>Non-Parallel </li></ul>Parallel Deg 2
  8. 8. <ul><li>The Degree of Parallelism (DOP) refers to the number of discrete threads of work </li></ul><ul><li>The default DOP for an Instance is calculated as </li></ul><ul><ul><li>cpu_count * parallel_threads_per_cpu </li></ul></ul><ul><ul><li>Used if I don’t specify a DOP in a hint or table definition </li></ul></ul><ul><li>The maximum number of PX slaves is :- </li></ul><ul><ul><li>DOP * 2 </li></ul></ul><ul><ul><li>Plus the Query Coordinator </li></ul></ul><ul><ul><li>But this is per Data Flow Operation </li></ul></ul><ul><ul><li>And the slaves will be re-used </li></ul></ul>Parallel Architecture
  9. 9. Parallel Architecture <ul><li>Inter-process communication is through message buffers (also known as table queues) </li></ul><ul><li>These can be stored in the shared pool or the large pool </li></ul>
  10. 10. Parallel Architecture <ul><li>This slide intentionally left blank </li></ul>
  11. 11. <ul><li>Methods of invoking Parallel Execution </li></ul><ul><ul><li>Table / Index Level </li></ul></ul><ul><ul><ul><ul><li>ALTER TABLE emp PARALLEL(DEGREE 2); </li></ul></ul></ul></ul><ul><ul><li>Optimizer Hints </li></ul></ul><ul><ul><ul><ul><li>SELECT /*+ PARALLEL(emp) */ * </li></ul></ul></ul></ul><ul><ul><ul><ul><li>FROM emp; </li></ul></ul></ul></ul><ul><ul><ul><li>Note Using Parallel Execution implies that you will be using the Cost-based Optimiser </li></ul></ul></ul><ul><ul><ul><li>As usual, appropriate statistics are vital </li></ul></ul></ul><ul><ul><li>Statement Level </li></ul></ul><ul><ul><ul><ul><li>ALTER INDEX emp_idx_1 REBUILD </li></ul></ul></ul></ul><ul><ul><ul><li>PARALLEL 8; </li></ul></ul></ul>Parallel Architecture
  12. 12. Configuration <ul><li>Introduction </li></ul><ul><li>Parallel Architecture </li></ul><ul><li>Configuration </li></ul><ul><li>Dictionary Views </li></ul><ul><li>Tracing and Wait Events </li></ul><ul><li>Conclusion </li></ul>
  13. 13. <ul><li>parallel_automatic_tuning </li></ul><ul><ul><li>First introduced in Oracle 8i </li></ul></ul><ul><ul><li>This is the first parameter you should set - to TRUE </li></ul></ul><ul><ul><ul><li>An alternative point of view – don’t use it! </li></ul></ul></ul><ul><ul><ul><li>Deprecated in 10G and default is FALSE but much of the same functionality is implemented </li></ul></ul></ul><ul><ul><li>Ensures that message queues are stored in the Large Pool rather than the Shared Pool </li></ul></ul><ul><ul><li>It modifies the values of other parameters </li></ul></ul><ul><ul><li>As well as the 10g default values, the following sections show the values when parallel_automatic_tuning is set to TRUE on previous versions </li></ul></ul>Configuration
  14. 14. <ul><li>parallel_adaptive_multi_user </li></ul><ul><ul><li>First introduced in Oracle 8 </li></ul></ul><ul><ul><li>Default Value – FALSE (TRUE in 10g) </li></ul></ul><ul><ul><li>Automatic Tuning Default – TRUE </li></ul></ul><ul><ul><li>Designed when using PX for online usage </li></ul></ul><ul><ul><li>As workload increases, new statements will have their degree of parallelism down-graded. </li></ul></ul>Configuration <ul><li>Effective Oracle by Design </li></ul><ul><ul><li>Tom Kyte </li></ul></ul><ul><ul><li>‘ This provides the best of both worlds and what users expect from a system. They know that when it is busy, it will run slower.’ </li></ul></ul>
  15. 15. <ul><li>parallel_max_servers </li></ul><ul><ul><li>Default - cpu_count * parallel_threads_per_cpu * 2 (if using automatic PGA management) * 5 </li></ul></ul><ul><ul><ul><li>e.g. 1 CPU * 2 * 2 * 5 = 20 on my laptop </li></ul></ul></ul><ul><ul><li>The maximum number of parallel execution slaves available for all sessions in this instance. </li></ul></ul><ul><ul><li>Watch out for the processes trap! </li></ul></ul><ul><li>parallel_min_servers </li></ul><ul><ul><li>Default - 0 </li></ul></ul><ul><ul><li>May choose to increase this if PX usage is constant to reduce overhead of starting and stopping slave processes. </li></ul></ul>Configuration More on this subject in tomorrow’s presentation
  16. 16. <ul><li>parallel_execution_message_size </li></ul><ul><ul><li>Default Value – 2148 bytes </li></ul></ul><ul><ul><li>Automatic Tuning Default – 4Kb </li></ul></ul><ul><ul><li>Maximum size of a message buffer </li></ul></ul><ul><ul><li>May be worth increasing to 8Kb, depending on wait event analysis. </li></ul></ul><ul><ul><li>However, small increases in message size could lead to large increases in large pool memory requirements </li></ul></ul><ul><ul><li>Remember that DOP 2 relationship and multiple sessions </li></ul></ul>Configuration
  17. 17. <ul><li>Metalink Note 201799.1 contains full details and guidance for setting all parameters </li></ul><ul><li>Ensure that standard parameters are also set appropriately </li></ul><ul><ul><li>large_pool_size </li></ul></ul><ul><ul><ul><li>Modified by parallel_automatic_tuning </li></ul></ul></ul><ul><ul><ul><li>Calculation in Data Warehousing Guide </li></ul></ul></ul><ul><ul><ul><li>Can be monitored using v$sgastat </li></ul></ul></ul><ul><ul><li>processes </li></ul></ul><ul><ul><ul><li>Modified by parallel_automatic_tuning </li></ul></ul></ul><ul><ul><li>sort_area_size </li></ul></ul><ul><ul><ul><li>For best results use automatic PGA management </li></ul></ul></ul><ul><ul><ul><li>Be aware of _smm_px_max_size </li></ul></ul></ul><ul><li>Metalink Note 201799.1 contains full details and guidance for all relevant parameters </li></ul>Configuration
  18. 18. Dictionary Views <ul><li>Introduction </li></ul><ul><li>Parallel Architecture </li></ul><ul><li>Configuration </li></ul><ul><li>Dictionary Views </li></ul><ul><li>Tracing and Wait Events </li></ul><ul><li>Conclusion </li></ul>
  19. 19. <ul><li>Parallel-specific Dictionary Views </li></ul><ul><ul><li>SELECT table_name </li></ul></ul><ul><ul><li>FROM dict </li></ul></ul><ul><ul><li>WHERE table_name LIKE 'V%PQ%' OR table_name like 'V%PX%‘; </li></ul></ul><ul><ul><li>TABLE_NAME </li></ul></ul><ul><ul><li>------------------------------ </li></ul></ul><ul><ul><li>V$PQ_SESSTAT </li></ul></ul><ul><ul><li>V$PQ_SYSSTAT </li></ul></ul><ul><ul><li>V$PQ_SLAVE </li></ul></ul><ul><ul><li>V$PQ_TQSTAT </li></ul></ul><ul><ul><li>V$PX_BUFFER_ADVICE </li></ul></ul><ul><ul><li>V$PX_SESSION </li></ul></ul><ul><ul><li>V$PX_SESSTAT </li></ul></ul><ul><ul><li>V$PX_PROCESS </li></ul></ul><ul><ul><li>V$PX_PROCESS_SYSSTAT </li></ul></ul><ul><ul><li>Also GV$PQ_SESSTAT and GV$PQ_TQSTAT with INST_ID </li></ul></ul>Dictionary Views
  20. 20. <ul><li>v$pq_sesstat </li></ul><ul><ul><li>Provides statistics relating to the current session </li></ul></ul><ul><ul><li>Useful for verifying that a specific query is using parallel execution as expected </li></ul></ul><ul><li> SELECT * FROM v$pq_sesstat; </li></ul><ul><ul><li>STATISTIC LAST_QUERY SESSION_TOTAL </li></ul></ul><ul><ul><li>------------------------------ ---------- ------------- </li></ul></ul><ul><ul><li>Queries Parallelized 1 1 </li></ul></ul><ul><ul><li>DML Parallelized 0 0 </li></ul></ul><ul><ul><li>DDL Parallelized 0 0 </li></ul></ul><ul><ul><li>DFO Trees 1 1 </li></ul></ul><ul><ul><li>Server Threads 3 0 </li></ul></ul><ul><ul><li>Allocation Height 3 0 </li></ul></ul><ul><ul><li>Allocation Width 1 0 </li></ul></ul><ul><ul><li>Local Msgs Sent 217 217 </li></ul></ul><ul><ul><li>Distr Msgs Sent 0 0 </li></ul></ul><ul><ul><li>Local Msgs Recv'd 217 217 </li></ul></ul><ul><ul><li>Distr Msgs Recv'd 0 0 </li></ul></ul>Dictionary Views
  21. 21. <ul><li>v$pq_sysstat </li></ul><ul><ul><li>The instance-level overview </li></ul></ul><ul><ul><li>Various values, including information to help set parallel_min_servers and parallel_max_servers </li></ul></ul><ul><ul><li>v$px_process_sysstat contains similar information </li></ul></ul><ul><ul><li>SELECT * FROM v$pq_sysstat WHERE statistic like ‘Servers%’; </li></ul></ul><ul><ul><li>STATISTIC VALUE </li></ul></ul><ul><ul><li>------------------------------ ---------- </li></ul></ul><ul><ul><li>Servers Busy 0 </li></ul></ul><ul><ul><li>Servers Idle 0 </li></ul></ul><ul><ul><li>Servers Highwater 3 </li></ul></ul><ul><ul><li>Server Sessions 3 </li></ul></ul><ul><ul><li>Servers Started 3 </li></ul></ul><ul><ul><li>Servers Shutdown 3 </li></ul></ul><ul><ul><li>Servers Cleaned Up 0 </li></ul></ul>Dictionary Views
  22. 22. <ul><li>v$pq_slave </li></ul><ul><ul><li>Gives information on the activity of individual PX slaves </li></ul></ul><ul><ul><li>v$px_process contains similar information </li></ul></ul><ul><ul><li>SELECT slave_name, status, sessions, msgs_sent_total, msgs_rcvd_total </li></ul></ul><ul><ul><li>FROM v$pq_slave; </li></ul></ul><ul><ul><li>SLAV STAT SESSIONS MSGS_SENT_TOTAL MSGS_RCVD_TOTAL </li></ul></ul><ul><ul><li>---- ---- ---------- --------------- --------------- </li></ul></ul><ul><ul><li>P000 BUSY 3 465 508 </li></ul></ul><ul><ul><li>P001 BUSY 3 356 290 </li></ul></ul><ul><ul><li>P002 BUSY 3 153 78 </li></ul></ul><ul><ul><li>P003 BUSY 3 108 63 </li></ul></ul><ul><ul><li>P004 IDLE 2 249 97 </li></ul></ul><ul><ul><li>P005 IDLE 2 246 97 </li></ul></ul><ul><ul><li>P006 IDLE 2 239 95 </li></ul></ul><ul><ul><li>P007 IDLE 2 249 96 </li></ul></ul>Dictionary Views
  23. 23. <ul><li>v$pq_tqstat </li></ul><ul><ul><li>Shows communication relationship between slaves </li></ul></ul><ul><ul><li>Must be executed from a session that’s been using parallel operations – refers to this session </li></ul></ul><ul><ul><li>Example 1 – Attendance Table (25,481 rows) </li></ul></ul><ul><ul><li>break on dfo_number on tq_id </li></ul></ul><ul><ul><li>SELECT /*+ PARALLEL (attendance, 4) */ * </li></ul></ul><ul><ul><li>FROM attendance; </li></ul></ul><ul><ul><li>SELECT dfo_number, tq_id, server_type, process, num_rows, bytes </li></ul></ul><ul><ul><li>FROM v$pq_tqstat </li></ul></ul><ul><ul><li>ORDER BY dfo_number DESC, tq_id, server_type DESC, process; </li></ul></ul><ul><ul><li>DFO_NUMBER TQ_ID SERVER_TYP PROCESS NUM_ROWS BYTES </li></ul></ul><ul><ul><li>---------- ---------- ---------- ---------- ---------- ---------- </li></ul></ul><ul><ul><li> 1 0 Producer P000 6605 114616 </li></ul></ul><ul><ul><li>Producer P001 6102 105653 </li></ul></ul><ul><ul><li>Producer P002 6251 110311 </li></ul></ul><ul><ul><li>Producer P003 6523 113032 </li></ul></ul><ul><ul><li>Consumer QC 25481 443612 </li></ul></ul>Dictionary Views
  24. 24. <ul><li>Example 2 - with a sort operation </li></ul><ul><ul><li>SELECT /*+ PARALLEL (attendance, 4) */ * </li></ul></ul><ul><ul><li>FROM attendance </li></ul></ul><ul><ul><li>ORDER BY amount_paid; </li></ul></ul><ul><ul><li>DFO_NUMBER TQ_ID SERVER_TYP PROCESS NUM_ROWS BYTES </li></ul></ul><ul><ul><li>---------- ---------- ---------- ---------- ---------- ---------- </li></ul></ul><ul><ul><li>1 0 Ranger QC 372 13322 </li></ul></ul><ul><ul><li> Producer P004 5744 100069 </li></ul></ul><ul><ul><li>Producer P005 6304 110167 </li></ul></ul><ul><ul><li>Producer P006 6303 109696 </li></ul></ul><ul><ul><li>Producer P007 7130 124060 </li></ul></ul><ul><ul><li> Consumer P000 15351 261380 </li></ul></ul><ul><ul><li> Consumer P001 10129 182281 </li></ul></ul><ul><ul><li>Consumer P002 0 103 </li></ul></ul><ul><ul><li>Consumer P003 1 120 </li></ul></ul><ul><ul><li>1 Producer P000 15351 261317 </li></ul></ul><ul><ul><li>Producer P001 10129 182238 </li></ul></ul><ul><ul><li>Producer P002 0 20 </li></ul></ul><ul><ul><li>Producer P003 1 37 </li></ul></ul><ul><ul><li> Consumer QC 25481 443612 </li></ul></ul>Dictionary Views
  25. 25. <ul><li>So why the unbalanced slaves? </li></ul><ul><ul><li>Check the list of distinct values in amount_paid </li></ul></ul><ul><ul><ul><ul><li>SELECT amount_paid, COUNT(*) </li></ul></ul></ul></ul><ul><ul><ul><ul><li>FROM attendance </li></ul></ul></ul></ul><ul><ul><ul><ul><li>GROUP BY amount_paid </li></ul></ul></ul></ul><ul><ul><ul><ul><li>ORDER BY amount_paid </li></ul></ul></ul></ul><ul><ul><ul><ul><li>/ </li></ul></ul></ul></ul><ul><ul><ul><ul><li>  </li></ul></ul></ul></ul><ul><ul><ul><ul><li>AMOUNT_PAID COUNT(*) </li></ul></ul></ul></ul><ul><ul><ul><ul><li>----------- ---------- </li></ul></ul></ul></ul><ul><ul><ul><ul><li>200 1 </li></ul></ul></ul></ul><ul><ul><ul><ul><li>850 1 </li></ul></ul></ul></ul><ul><ul><ul><ul><li>900 1 </li></ul></ul></ul></ul><ul><ul><ul><ul><li>1000 7 </li></ul></ul></ul></ul><ul><ul><ul><ul><li>1150 1 </li></ul></ul></ul></ul><ul><ul><ul><ul><li>1200 15340 </li></ul></ul></ul></ul><ul><ul><ul><ul><li>1995 10129 </li></ul></ul></ul></ul><ul><ul><ul><ul><li>4000 1 </li></ul></ul></ul></ul>Dictionary Views
  26. 26. <ul><li>v$px_session and v$px_sesstat </li></ul><ul><ul><li>Query to show slaves and physical reads </li></ul></ul><ul><ul><li>break on qcsid on server_set </li></ul></ul><ul><ul><li>SELECT stat.qcsid, stat.server_set, stat.server#, nam.name, stat.value </li></ul></ul><ul><ul><li>FROM v$px_sesstat stat, v$statname nam </li></ul></ul><ul><ul><li>WHERE stat.statistic# = nam.statistic# </li></ul></ul><ul><ul><li>AND nam.name = ‘physical reads’ </li></ul></ul><ul><ul><li>ORDER BY 1,2,3 </li></ul></ul><ul><ul><li>QCSID SERVER_SET SERVER# NAME VALUE </li></ul></ul><ul><ul><li>---------- ---------- ---------- -------------------- ---------- </li></ul></ul><ul><ul><li>145 1 1 physical reads 0 </li></ul></ul><ul><ul><li>2 physical reads 0 </li></ul></ul><ul><ul><li>3 physical reads 0 </li></ul></ul><ul><ul><li>2 1 physical reads 63 </li></ul></ul><ul><ul><li>2 physical reads 56 </li></ul></ul><ul><ul><li>3 physical reads 61 </li></ul></ul><ul><ul><li>physical reads 4792 </li></ul></ul>Dictionary Views
  27. 27. <ul><li>v$px_process </li></ul><ul><ul><li>Shows parallel execution slave processes, status and session information </li></ul></ul><ul><ul><li>SELECT * FROM v$px_process; </li></ul></ul><ul><ul><li>SERV STATUS PID SPID SID SERIAL# </li></ul></ul><ul><ul><li>---- --------- ---------- ------------ ---------- ---------- </li></ul></ul><ul><ul><li>P001 IN USE 18 7680 144 17 </li></ul></ul><ul><ul><li>P004 IN USE 20 7972 146 11 </li></ul></ul><ul><ul><li>P005 IN USE 21 8040 148 25 </li></ul></ul><ul><ul><li>P000 IN USE 16 7628 150 16 </li></ul></ul><ul><ul><li>P006 IN USE 24 8100 151 66 </li></ul></ul><ul><ul><li>P003 IN USE 19 7896 152 30 </li></ul></ul><ul><ul><li>P007 AVAILABLE 25 5804 </li></ul></ul><ul><ul><li>P002 AVAILABLE 12 6772 </li></ul></ul>Dictionary Views
  28. 28. <ul><li>Monitoring the SQL being executed by slaves </li></ul><ul><ul><li>set pages 0 </li></ul></ul><ul><ul><li>column sql_text format a60 </li></ul></ul><ul><ul><li>  </li></ul></ul><ul><ul><li>select p.server_name, </li></ul></ul><ul><ul><li>sql.sql_text </li></ul></ul><ul><ul><li>from v$px_process p, v$sql sql, v$session s </li></ul></ul><ul><ul><li>WHERE p.sid = s.sid AND p.serial# = s.serial# </li></ul></ul><ul><ul><li>AND s.sql_address = sql.address AND s.sql_hash_value = sql.hash_value </li></ul></ul><ul><ul><li>/ </li></ul></ul><ul><ul><li>9i Results </li></ul></ul><ul><ul><li>P001 SELECT A1.C0 C0,A1.C1 C1,A1.C2 C2,A1.C3 C3,A1.C4 C4,A1.C5 C5, </li></ul></ul><ul><ul><ul><li>A1.C6 C6,A1.C7 C7 FROM :Q3000 A1 ORDER BY A1.C0 </li></ul></ul></ul><ul><ul><li>10g Results </li></ul></ul><ul><ul><li>P001 SELECT /*+ PARALLEL (attendance, 2) */ * FROM attendance </li></ul></ul><ul><ul><li>ORDER BY amount_paid </li></ul></ul>Dictionary Views
  29. 29. <ul><li>Additional information in standard Dictionary Views </li></ul><ul><ul><li>e.g. v$sysstat </li></ul></ul><ul><ul><li>SELECT name, value FROM v$sysstat WHERE name LIKE 'PX%'; </li></ul></ul><ul><ul><li>NAME VALUE </li></ul></ul><ul><ul><li>---------------------------------------------- ---------- </li></ul></ul><ul><ul><li>PX local messages sent 4895 </li></ul></ul><ul><ul><li>PX local messages recv'd 4892 </li></ul></ul><ul><ul><li>PX remote messages sent 0 </li></ul></ul><ul><ul><li>PX remote messages recv'd 0 </li></ul></ul>Dictionary Views
  30. 30. <ul><li>Monitoring the adaptive multi-user algorithm </li></ul><ul><ul><li>We need to be able to check whether operations are being downgraded and by how much </li></ul></ul><ul><ul><li>Downgraded to serial could be a particular problem! </li></ul></ul><ul><li>SELECT name, value FROM v$sysstat WHERE name LIKE 'Parallel%' </li></ul><ul><li>NAME VALUE </li></ul><ul><li>---------------------------------------------------------------- ---------- </li></ul><ul><li>Parallel operations not downgraded 546353 </li></ul><ul><li>Parallel operations downgraded to serial 432 </li></ul><ul><li>Parallel operations downgraded 75 to 99 pct 790 </li></ul><ul><li>Parallel operations downgraded 50 to 75 pct 1454 </li></ul><ul><li>Parallel operations downgraded 25 to 50 pct 7654 </li></ul><ul><li>Parallel operations downgraded 1 to 25 pct 11873 </li></ul>Dictionary Views <ul><li>Monitoring the adaptive multi-user algorithm </li></ul><ul><ul><li>We need to be able to check whether operations are being downgraded and by how much </li></ul></ul><ul><ul><li>Downgraded to serial could be a particular problem! </li></ul></ul><ul><li>SELECT name, value FROM v$sysstat WHERE name LIKE 'Parallel%' </li></ul><ul><li>NAME VALUE </li></ul><ul><li>------------------ ---------------------------------------------- ---------- </li></ul><ul><li>Parallel operations not downgraded 546353 </li></ul><ul><li>P*ssed-off users 432 </li></ul><ul><li>Parallel operations downgraded 75 to 99 pct 790 </li></ul><ul><li>Parallel operations downgraded 50 to 75 pct 1454 </li></ul><ul><li>Parallel operations downgraded 25 to 50 pct 7654 </li></ul><ul><li>Parallel operations downgraded 1 to 25 pct 11873 </li></ul>
  31. 31. <ul><li>Statspack </li></ul><ul><ul><li>Example Report (Excerpt) </li></ul></ul><ul><ul><li>During overnight batch operation </li></ul></ul><ul><ul><li>Mainly Bitmap Index creation </li></ul></ul><ul><ul><li>Slightly difficult to read </li></ul></ul><ul><ul><ul><li>Parallel operations downgraded 1 0 </li></ul></ul></ul><ul><ul><ul><li>Parallel operations downgraded 25 0 </li></ul></ul></ul><ul><ul><ul><li>Parallel operations downgraded 50 7 </li></ul></ul></ul><ul><ul><ul><li>Parallel operations downgraded 75 38 </li></ul></ul></ul><ul><ul><ul><li>Parallel operations downgraded to 1 </li></ul></ul></ul><ul><ul><ul><li>Parallel operations not downgrade 22 </li></ul></ul></ul><ul><ul><li>With one stream downgraded to serial, the rest of the schedule may depend on this one job. </li></ul></ul>Dictionary Views
  32. 32. Tracing and Wait Events <ul><li>Introduction </li></ul><ul><li>Parallel Architecture </li></ul><ul><li>Configuration </li></ul><ul><li>Dictionary Views </li></ul><ul><li>Tracing and Wait Events </li></ul><ul><li>Conclusion </li></ul>
  33. 33. <ul><li>Tracing Parallel Execution operations is more complicated than standard tracing </li></ul><ul><ul><li>One trace file per slave (as well as the query coordinator) </li></ul></ul><ul><ul><li>Potentially 5 trace files even with a DOP of 2 </li></ul></ul><ul><ul><li>May be in background_dump_dest or user_dump_dest (usually background_dump_dest) </li></ul></ul>Tracing and Wait Events <ul><li>Optimizing Oracle Performance </li></ul><ul><ul><li>Millsap and Holt </li></ul></ul><ul><ul><li>‘ The remaining task is to identify and analyze all of the relevant trace files. This task is usually simple …’ </li></ul></ul><ul><li>                                                          </li></ul>
  34. 34. <ul><li>Much simpler in 10g </li></ul><ul><ul><li>Use trcsess to generate a consolidated trace file for QC and all slaves </li></ul></ul><ul><ul><li>exec dbms_session.set_identifier(‘PX_TEST'); </li></ul></ul><ul><ul><li>REM tracefile_identifier is optional, but might make things easier for you </li></ul></ul><ul><ul><li>alter session set tracefile_identifier=‘PX_TEST'; </li></ul></ul><ul><ul><li>exec dbms_monitor.client_id_trace_enable(‘PX_TEST'); </li></ul></ul><ul><ul><li>REM DO WORK </li></ul></ul><ul><ul><li>exec dbms_monitor.client_id_trace_disable(‘PX_TEST’); </li></ul></ul><ul><ul><li>GENERATE THE CONSOLIDATED TRACE FILE AND THEN RUN IT THROUGH TKPROF </li></ul></ul><ul><ul><li>trcsess output=/ora/admin/TEST1020/udump/PX_TEST.trc clientid=PX_TEST /ora/admin/TEST1020/udump/*px_test*.trc /ora/admin/TEST1020/bdump/*.trc </li></ul></ul><ul><ul><li>tkprof /ora/admin/TEST1020/udump/DOUG.trc /ora/admin/TEST1020/udump/DOUG.out </li></ul></ul>Tracing and Wait Events
  35. 35. <ul><li>This is what one of the slaves looks like </li></ul><ul><ul><li>C:oracleproduct10.2.0adminORCLudump>cd ../bdump </li></ul></ul><ul><ul><li>C:oracleproduct10.2.0adminORCLdump>more orcl_ p000 _2748.trc </li></ul></ul><ul><ul><li><SNIPPED> </li></ul></ul><ul><ul><li>*** SERVICE NAME:(SYS$USERS) 2006-03-07 10:57:29.812 </li></ul></ul><ul><ul><li>*** CLIENT ID:(PX_TEST) 2006-03-07 10:57:29.812 </li></ul></ul><ul><ul><li>*** SESSION ID:(151.24) 2006-03-07 10:57:29.812 </li></ul></ul><ul><ul><li>WAIT #0: nam= 'PX Deq: Msg Fragment' ela= 13547 sleeptime/senderid=268566527 passes=1 p3=0 obj#=-1 tim=3408202924 </li></ul></ul><ul><ul><li>===================== </li></ul></ul><ul><ul><li>PARSING IN CURSOR #1 len=60 dep=1 uid=70 oct=3 lid=70 tim=3408244715 hv=1220056081 ad='6cc64000' </li></ul></ul><ul><ul><li>select /*+ parallel(test_tab3, 2) */ count(*) </li></ul></ul><ul><ul><li>from test_tab3 </li></ul></ul><ul><ul><li>END OF STMT </li></ul></ul>Tracing and Wait Events
  36. 36. <ul><li>Many more wait events and more time spent waiting </li></ul><ul><ul><li>The various processes need to communicate with each other </li></ul></ul><ul><ul><li>Metalink Note 191103.1 lists the wait events related to Parallel Execution </li></ul></ul><ul><ul><li>But be careful of what ‘Idle’ means </li></ul></ul>Tracing and Wait Events
  37. 37. <ul><li>Events indicating consumers or QC are waiting for data from producers </li></ul><ul><ul><li>PX Deq: Execute Reply </li></ul></ul><ul><ul><li>PX Deq: Table Q Normal </li></ul></ul><ul><li>Although considered idle events, if these waits are excessive, it could indicate a problem in the performance of the slaves </li></ul><ul><li>Investigate the slave trace files </li></ul><ul><li>                                                         </li></ul>Tracing and Wait Events
  38. 38. <ul><li>Events indicating producers are quicker than consumers (or QC) </li></ul><ul><ul><li>PX qref latch </li></ul></ul><ul><li>Try increasing parallel_execution_message_size as this might reduce the communications overhead </li></ul><ul><li>Although it could make things worse if the consumer is just taking time to process the incoming data. </li></ul><ul><li>                                                         </li></ul>Tracing and Wait Events
  39. 39. <ul><li>Messaging Events </li></ul><ul><ul><li>PX Deq Credit: need buffer </li></ul></ul><ul><ul><li>PX Deq Credit: send blkd </li></ul></ul><ul><li>Although there may be many waits, the time spent should not be a problem. </li></ul><ul><li>If it is, perhaps you have an extremely busy server that is struggling to cope </li></ul><ul><ul><li>Reduce DOP? </li></ul></ul><ul><ul><li>Increase parallel_execution_message_size? </li></ul></ul><ul><ul><li>Don’t use PX? </li></ul></ul>Tracing and Wait Events
  40. 40. <ul><li>Query Coordinator waiting for the slaves to parse their SQL statements </li></ul><ul><ul><li>PX Deq: Parse Reply </li></ul></ul><ul><li>If there are any significant waits for this event, this may indicate you have shared pool resource issues. </li></ul><ul><li>Or you’ve encountered a bug! </li></ul>Tracing and Wait Events
  41. 41. <ul><li>Partial Message Event </li></ul><ul><ul><li>PX Deq: Msg Fragment </li></ul></ul><ul><li>May be eliminated or improved by increasing parallel_execution_message_size </li></ul><ul><li>Not an issue on recent tests </li></ul>Tracing and Wait Events
  42. 42. <ul><li>Example </li></ul><ul><ul><li>Excerpt from an overnight Statspack Report </li></ul></ul><ul><li>                                                          </li></ul><ul><ul><li>Event             Waits  Timeouts  Time (s)   (ms)     /txn </li></ul></ul><ul><ul><li>direct Path read  2,249,666    0    115,813     51     25.5 </li></ul></ul><ul><ul><li>PX Deq: Execute Reply  553,797 22,006     75,910    137      6.3 </li></ul></ul><ul><ul><li>PX qref latch              77,461     39,676     42,257    546      0.9 </li></ul></ul><ul><ul><li>library cache pin          27,877     10,404     31,422   1127      0.3 </li></ul></ul><ul><ul><li>db file scattered read  1,048,135          0     25,144     24     11.9 </li></ul></ul><ul><ul><li>Direct Path Reads </li></ul></ul><ul><ul><ul><li>Sort I/O </li></ul></ul></ul><ul><ul><ul><li>Read-ahead </li></ul></ul></ul><ul><ul><ul><li>PX Slave I/O </li></ul></ul></ul><ul><ul><ul><li>The average wait time – SAN! </li></ul></ul></ul>Tracing and Wait Events
  43. 43. <ul><li>                                                      </li></ul><ul><ul><li>Event             Waits  Timeouts  Time (s)   (ms)     /txn </li></ul></ul><ul><ul><li>direct Path read  2,249,666    0    115,813     51     25.5 </li></ul></ul><ul><ul><li>PX Deq: Execute Reply  553,797 22,006     75,910    137      6.3 </li></ul></ul><ul><ul><li>PX qref latch              77,461     39,676     42,257    546      0.9 </li></ul></ul><ul><ul><li>library cache pin          27,877     10,404     31,422   1127      0.3 </li></ul></ul><ul><ul><li>db file scattered read  1,048,135          0     25,144     24     11.9 </li></ul></ul><ul><ul><li>PX Deq: Execute Reply </li></ul></ul><ul><ul><ul><li>Idle event – QC waiting for a response from slaves </li></ul></ul></ul><ul><ul><ul><li>Some waiting is inevitable </li></ul></ul></ul><ul><ul><li>PX qref latch </li></ul></ul><ul><ul><ul><li>Largely down to the extreme use of Parallel Execution </li></ul></ul></ul><ul><ul><ul><li>Practically unavoidable but perhaps we could increase parallel_execution_message_size? </li></ul></ul></ul><ul><ul><li>Library cache pin? </li></ul></ul><ul><ul><ul><li>Need to look at the trace files </li></ul></ul></ul>Tracing and Wait Events
  44. 44. Conclusion <ul><li>Introduction </li></ul><ul><li>Parallel Architecture </li></ul><ul><li>Configuration </li></ul><ul><li>Dictionary Views </li></ul><ul><li>Tracing and Wait Events </li></ul><ul><li>Conclusion </li></ul>
  45. 45. <ul><li>Plan / Test / Implement </li></ul><ul><ul><li>Asking for trouble if you don’t! </li></ul></ul><ul><li>Hardware </li></ul><ul><ul><li>It’s designed to suck the server dry </li></ul></ul><ul><ul><li>Trying to squeeze a quart into a pint pot will make things slow down due to contention </li></ul></ul><ul><li>Tune the SQL first </li></ul><ul><ul><li>All the old rules apply </li></ul></ul><ul><ul><li>The biggest improvements come from doing less unnecessary work in the first place </li></ul></ul><ul><ul><li>Even if PX does make things go quickly enough, it’s going to use a lot more resources doing so </li></ul></ul>Conclusion
  46. 46. <ul><li>Don’t use it for small, fast tasks </li></ul><ul><ul><li>They won’t go much quicker </li></ul></ul><ul><ul><li>They might go slower </li></ul></ul><ul><ul><li>They will use more resources </li></ul></ul><ul><li>Don’t use it for online </li></ul><ul><ul><li>Not unless it’s a handful of users </li></ul></ul><ul><ul><li>With a predictable maximum number of concurrent activities </li></ul></ul><ul><ul><li>Who understand the implications and won’t go crazy when something takes four times as long as normal! </li></ul></ul><ul><ul><li>It gives a false initial perception of high performance and isn’t scalable </li></ul></ul><ul><ul><li>Okay, Tom, set parallel_adaptive_multi_user to TRUE </li></ul></ul>Conclusion
  47. 47. <ul><li>The slower your I/O sub-system, the more benefit you are likely to see from PX </li></ul><ul><ul><li>But shouldn’t you fix the underlying problem? </li></ul></ul><ul><ul><li>More on this in the next presentation </li></ul></ul><ul><li>Consider whether PX is the correct parallel solution for overnight batch operations </li></ul><ul><ul><li>A single stream of parallel jobs? </li></ul></ul><ul><ul><li>Parallel streams of single-threaded jobs? </li></ul></ul><ul><ul><li>Unfortunately you’ll probably have to do some work to prove your ideas! </li></ul></ul>Conclusion
  48. 48. Tuning & Tracing Parallel Execution (An Introduction) Doug Burns (dougburns@yahoo.com) (oracledoug.blogspot.com) (doug.burns.tripod.com)

×