Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Analysing and troubleshooting Parallel Execution IT Tage 2015

666 views

Published on

My presentation "Analysing and troubleshooting Parallel Execution" held at IT Tage Conference 2015, Frankfurt

  • Be the first to comment

Analysing and troubleshooting Parallel Execution IT Tage 2015

  1. 1.  Independent consultant  Performance Troubleshooting  In-house workshops  Cost-Based Optimizer  Performance By Design  Oracle ACE Director  Member of OakTable Network
  2. 2.  Parallel Execution introduction  Major challenges  Parallel unfriendly examples  Distribution skew examples  How to measure distribution of work  Fixing work distribution skew
  3. 3.  Oracle Database Enterprise Edition includes the powerful Parallel Execution feature that allows spreading the processing of a single SQL statement execution across multiple worker processes  The feature is fully integrated into the Cost Based Optimizer as well as the execution runtime engine and automatically distributes the work across the so called Parallel Workers
  4. 4.  Simple generic parallelization example Task: Compute sum of 8 numbers 1+8=9, 9+7=16, 16+9=25,... 1+8+7+9+6+2+6+3= ??? n=8 numbers, 7 computation steps required Serial execution: 7 time units
  5. 5. Simple generic parallelization example 4 workers 3+ time units 1 + 8 = 9 9 + 7 = 16 6 + 2 = 8 6 + 3 = 9 9 + 16 = 25 8 + 9 = 17 25 + 17 = 42 Coordinator
  6. 6. Parallel Execution doesn’t mean “work smarter” You’re actually willing to accept to “work harder” Could also be called: “Brute force” approach
  7. 7. So with Parallel Execution there might be the problem that it doesn’t work “hard enough”
  8. 8.  Two major challenges  Can the given task be divided into sub-tasks that can efficiently and independently be processed by the workers? (“Parallel Unfriendly”)  Can all assigned workers be kept busy all the time?
  9. 9.  More reasons why Oracle Parallel Execution might not reduce runtime as expected:  Parallel DML/DDL gotchas  “Downgrade” at execution time (less workers assigned than expected)  Overhead of Parallel Execution implementation  Limitations of Parallel Execution implementation
  10. 10. Parallel DML / DDL gotchas  DML / DDL part can run parallel or serial  Query part can run parallel or serial
  11. 11. Parallel CTAS but serial query ------------------------------------------------------------------------- | Id | Operation | Name | TQ |IN-OUT| PQ Distrib | ------------------------------------------------------------------------- | 0 | CREATE TABLE STATEMENT | | | | | | 1 | PX COORDINATOR | | | | | | 2 | PX SEND QC (RANDOM) | :TQ10001 | Q1,01 | P->S | QC (RAND) | | 3 | LOAD AS SELECT | T4 | Q1,01 | PCWP | | | 4 | PX RECEIVE | | Q1,01 | PCWP | | | 5 | PX SEND ROUND-ROBIN| :TQ10000 | | S->P | RND-ROBIN | |* 6 | HASH JOIN | | | | | | 7 | TABLE ACCESS FULL| T2 | | | | | 8 | TABLE ACCESS FULL| T2 | | | | -------------------------------------------------------------------------
  12. 12. Serial CTAS but parallel query -------------------------------------------------------------------------- | Id | Operation | Name | TQ |IN-OUT| PQ Distrib | -------------------------------------------------------------------------- | 0 | CREATE TABLE STATEMENT | | | | | | 1 | LOAD AS SELECT | T4 | | | | | 2 | PX COORDINATOR | | | | | | 3 | PX SEND QC (RANDOM) | :TQ10002 | Q1,02 | P->S | QC (RAND) | |* 4 | HASH JOIN BUFFERED | | Q1,02 | PCWP | | | 5 | PX RECEIVE | | Q1,02 | PCWP | | | 6 | PX SEND HASH | :TQ10000 | Q1,00 | P->P | HASH | | 7 | PX BLOCK ITERATOR | | Q1,00 | PCWC | | | 8 | TABLE ACCESS FULL| T2 | Q1,00 | PCWP | | | 9 | PX RECEIVE | | Q1,02 | PCWP | | | 10 | PX SEND HASH | :TQ10001 | Q1,01 | P->P | HASH | | 11 | PX BLOCK ITERATOR | | Q1,01 | PCWC | | | 12 | TABLE ACCESS FULL| T2 | Q1,01 | PCWP | | --------------------------------------------------------------------------
  13. 13.  Other reasons why Oracle Parallel Execution might not scale as expected:  Parallel DML/DDL gotchas  “Downgrade” at execution time (less workers assigned than expected)  Overhead of Parallel Execution implementation  Limitations of Parallel Execution implementation
  14. 14. “Parallel Forced Serial” Example ------------------------------------------------------------------------------ | Id | Operation | Name | TQ |IN-OUT| PQ Distrib | ------------------------------------------------------------------------------ | 0 | SELECT STATEMENT | | | | | | 1 | PX COORDINATOR FORCED SERIAL| | | | | | 2 | PX SEND QC (RANDOM) | :TQ10003 | Q1,03 | P->S | QC (RAND) | | 3 | HASH UNIQUE | | Q1,03 | PCWP | | | 4 | PX RECEIVE | | Q1,03 | PCWP | | | 5 | PX SEND HASH | :TQ10002 | Q1,02 | P->P | HASH | |* 6 | HASH JOIN BUFFERED | | Q1,02 | PCWP | | | 7 | PX RECEIVE | | Q1,02 | PCWP | | | 8 | PX SEND HASH | :TQ10000 | Q1,00 | P->P | HASH | | 9 | PX BLOCK ITERATOR | | Q1,00 | PCWC | | | 10 | TABLE ACCESS FULL | T2 | Q1,00 | PCWP | | | 11 | PX RECEIVE | | Q1,02 | PCWP | | | 12 | PX SEND HASH | :TQ10001 | Q1,01 | P->P | HASH | | 13 | PX BLOCK ITERATOR | | Q1,01 | PCWC | | | 14 | TABLE ACCESS FULL | T2 | Q1,01 | PCWP | | ------------------------------------------------------------------------------
  15. 15.  Two major challenges  Can the given task be divided into sub-tasks that can efficiently and independently be processed by the workers? (“Parallel Unfriendly”)  Can all assigned workers be kept busy all the time?
  16. 16. select median(id) from t2; ----------------------------------------------------------------------- | Id | Operation | Name | TQ |IN-OUT| PQ Distrib | ----------------------------------------------------------------------- | 0 | SELECT STATEMENT | | | | | | 1 | SORT GROUP BY | | | | | | 2 | PX COORDINATOR | | | | | | 3 | PX SEND QC (RANDOM)| :TQ10000 | Q1,00 | P->S | QC (RAND) | | 4 | PX BLOCK ITERATOR | | Q1,00 | PCWC | | | 5 | TABLE ACCESS FULL| T2 | Q1,00 | PCWP | | -----------------------------------------------------------------------
  17. 17. create table t3 parallel as select * from t2 where rownum <= 10000000; ----------------------------------------------------------------------------- | Id | Operation | Name | TQ |IN-OUT| PQ Distrib | ----------------------------------------------------------------------------- | 0 | CREATE TABLE STATEMENT | | | | | | 1 | PX COORDINATOR | | | | | | 2 | PX SEND QC (RANDOM) | :TQ20001 | Q2,01 | P->S | QC (RAND) | | 3 | LOAD AS SELECT | T3 | Q2,01 | PCWP | | | 4 | PX RECEIVE | | Q2,01 | PCWP | | | 5 | PX SEND ROUND-ROBIN | :TQ20000 | | S->P | RND-ROBIN | |* 6 | COUNT STOPKEY | | | | | | 7 | PX COORDINATOR | | | | | | 8 | PX SEND QC (RANDOM) | :TQ10000 | Q1,00 | P->S | QC (RAND) | |* 9 | COUNT STOPKEY | | Q1,00 | PCWC | | | 10 | PX BLOCK ITERATOR | | Q1,00 | PCWC | | | 11 | TABLE ACCESS FULL| T2 | Q1,00 | PCWP | | -----------------------------------------------------------------------------
  18. 18. create table t3 parallel as select * from (select a.*, lag(filler, 1) over (order by id) as prev_filler from t2 a); -------------------------------------------------------------------------------- | Id | Operation | Name | TQ |IN-OUT| PQ Distrib | -------------------------------------------------------------------------------- | 0 | CREATE TABLE STATEMENT | | | | | | 1 | PX COORDINATOR | | | | | | 2 | PX SEND QC (RANDOM) | :TQ20001 | Q2,01 | P->S | QC (RAND) | | 3 | LOAD AS SELECT | T3 | Q2,01 | PCWP | | | 4 | PX RECEIVE | | Q2,01 | PCWP | | | 5 | PX SEND ROUND-ROBIN | :TQ20000 | | S->P | RND-ROBIN | | 6 | VIEW | | | | | | 7 | WINDOW BUFFER | | | | | | 8 | PX COORDINATOR | | | | | | 9 | PX SEND QC (ORDER) | :TQ10001 | Q1,01 | P->S | QC (ORDER) | | 10 | SORT ORDER BY | | Q1,01 | PCWP | | | 11 | PX RECEIVE | | Q1,01 | PCWP | | | 12 | PX SEND RANGE | :TQ10000 | Q1,00 | P->P | RANGE | | 13 | PX BLOCK ITERATOR | | Q1,00 | PCWC | | | 14 | TABLE ACCESS FULL| T2 | Q1,00 | PCWP | | --------------------------------------------------------------------------------
  19. 19. All these examples have one thing in common: If the Query Coordinator (non-parallel part) needs to perform a significant part of the overall work, Parallel Execution won’t reduce the runtime as expected
  20. 20.  Two major challenges  Can the given task be divided into sub-tasks that can efficiently and independently be processed by the workers? (“Parallel Unfriendly”)  Can all assigned workers be kept busy all the time?
  21. 21.  Parallel Execution introduction  Major challenges  Parallel unfriendly examples  Distribution skew examples  How to measure distribution of work  Fixing work distribution skew
  22. 22.  Extended SQL tracing  Generates one trace file per Parallel Worker process in database server trace directory  For cross-instance Parallel Execution this means files spread across more than one server  Each Parallel Worker trace file lists, among others, the number of rows produced per plan line and elapsed time
  23. 23.  Extended SQL tracing Allows detecting data distribution skew Usually requires to reproduce the issue Tedious to collect and skim through dozens of trace files (no tool known that automates that job)
  24. 24.  DBMS_XPLAN.DISPLAY_CURSOR + Rowsource statistics  Based on same internal code instrumentation as extended SQL trace  Feeds back rowsource statistics (rows produced on execution plan line level, timing, I/O stats) into Library Cache  Very useful for analyzing serial executions
  25. 25.  DBMS_XPLAN.DISPLAY_CURSOR + Rowsource statistics Not enabled by default, need to reproduce Doesn’t cope with well multiple Parallel Execution Servers / more complex parallel plans or Cross Instance RAC execution No information per Parallel Execution Server => Therefore not able to detect distribution skew
  26. 26.  Analyzing Data Distribution Skew  Oracle since a long time offers a special view called V$PQ_TQSTAT  It is populated after a successful Parallel Execution in the session of the Query Coordinator  It lists the amount of data send via the “Table Queues” / “Virtual Tables”  In theory this allows exact analysis which operations caused an uneven data distribution
  27. 27.  V$PQ_TQSTAT skewed execution example DFO_NUMBER TQ_ID SERVER_TYP NUM_ROWS BYTES PROCESS ---------- ---------- ---------- ---------- ---------- ---------- 1 0 Producer 500807 54396692 P008 1 0 Producer 499193 54259853 P009 1 0 Consumer 500038 54332349 P010 1 0 Consumer 499962 54324196 P011 1 1 Producer 499826 57267724 P008 1 1 Producer 500174 57357253 P009 1 1 Consumer 499933 57304789 P010 1 1 Consumer 500067 57320188 P011 1 2 Producer 462069 52963939 P010 1 2 Producer 537931 61681312 P011 1 2 Consumer 500212 57346574 P008 1 2 Consumer 499788 57298677 P009 1 3 Producer 500038 57337041 P010 1 3 Producer 499962 57328456 P011 1 3 Consumer 0 48 P008 1 3 Consumer 1000000 114665449 P009 1 4 Producer 0 24 P008 1 4 Producer 1000000 116668401 P009 1 4 Consumer 1000000 116668425 QC
  28. 28.  V$PQ_TQSTAT Allows detecting data distribution skew Can only be queried from inside session Doesn’t cope with well more complex parallel plans No information about relevance of skew
  29. 29. Easy to identify whether all workers are kept busy all the time or not Easy to identify if there was a problem with work distribution Shows actual parallel degree used (“Parallel Downgrade”) Supports RAC
  30. 30. Reports are not persisted and will be flushed from memory quite quickly on busy systems No easy identification and therefore no systematic troubleshooting which plan operations cause a work distribution problem Lacks some precision regarding Parallel Execution details
  31. 31.  Real-Time SQL Monitoring allows detecting Parallel Execution skew in the following way:  In the “Activity” tab  The average active sessions will be less than the DOP of the operation, allows to detect both temporal and data distribution skew  In the “Parallel” tab  The DB Time recorded per Parallel Slave will show an uneven distribution of active work time
  32. 32.  Real-Time SQL Monitoring “Activity” tab – skewed execution example
  33. 33.  Real-Time SQL Monitoring “Parallel” tab – skewed execution example
  34. 34.  One very useful approach is using Active Session History (ASH)  ASH samples active sessions once a second  Activity of Parallel Workers over time can easily be analyzed  From 11g on the ASH data even contains a reference to the execution plan line, so a relation between Parallel Worker activity and execution plan line based on ASH is possible
  35. 35.  Custom queries on ASH data required for detailed analysis  XPLAN_ASH tool runs these queries for a given SQL_ID  Advantage of ASH is the availability of retained historic ASH data via AWR on disk  Information can be extracted even for SQL executions as long ago as the retention configured for AWR
  36. 36.  Parallel Execution introduction  Major challenges  Parallel unfriendly examples  Distribution skew examples  How to measure distribution of work  Fixing work distribution skew
  37. 37. Fixing Work Distribution Skew  Influence Parallel Distribution: Data volume estimates, PQ_DISTRIBUTE / PQ_[NO]MAP / PQ_SKEW (12c+) hint  Limit impact by influencing join order  Rewrite queries trying to limit or avoid skew  Remap skewed data changing the value distribution
  38. 38. Fixing Work Distribution Skew  Automatic join skew detection in 12c (PQ_SKEW)  Supports only parallel HASH JOINs  Supports at present only simple probe row sources (no result of other joins supported)  Histogram showing popular values required (or forced via PQ_SKEW hint)
  39. 39. ---------------------------------------------------------------------------------- | Id | Operation | Name | TQ |IN-OUT| PQ Distrib | ---------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | | | | | 1 | SORT AGGREGATE | | | | | | 2 | PX COORDINATOR | | | | | | 3 | PX SEND QC (RANDOM) | :TQ10002 | Q1,02 | P->S | QC (RAND) | | 4 | SORT AGGREGATE | | Q1,02 | PCWP | | |* 5 | HASH JOIN | | Q1,02 | PCWP | | | 6 | PX RECEIVE | | Q1,02 | PCWP | | | 7 | PX SEND HYBRID HASH | :TQ10000 | Q1,00 | P->P | HYBRID HASH| | 8 | STATISTICS COLLECTOR | | Q1,00 | PCWC | | | 9 | PX BLOCK ITERATOR | | Q1,00 | PCWC | | |* 10 | TABLE ACCESS FULL | T_1 | Q1,00 | PCWP | | | 11 | PX RECEIVE | | Q1,02 | PCWP | | | 12 | PX SEND HYBRID HASH (SKEW)| :TQ10001 | Q1,01 | P->P | HYBRID HASH| | 13 | PX BLOCK ITERATOR | | Q1,01 | PCWC | | |* 14 | TABLE ACCESS FULL | T_2 | Q1,01 | PCWP | | ----------------------------------------------------------------------------------
  40. 40. Fixing Work Distribution Skew  More information:  http://oracle- randolf.blogspot.com/2014/08/parallel-execution- skew-summary.html
  41. 41. Q & A

×