Introduction to Parallel Execution
Upcoming SlideShare
Loading in...5
×
 

Introduction to Parallel Execution

on

  • 13,242 views

An overview presentation covering the use of Oracle's PX functionality including some tips and traps. Detailed white paper at http://oracledoug.com/px.html

An overview presentation covering the use of Oracle's PX functionality including some tips and traps. Detailed white paper at http://oracledoug.com/px.html

Statistics

Views

Total Views
13,242
Views on SlideShare
13,220
Embed Views
22

Actions

Likes
5
Downloads
416
Comments
3

2 Embeds 22

http://www.slideshare.net 21
http://ccc.blackboard.com 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel

13 of 3 Post a comment

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Good morning . How are those hangovers coming along, then? Today I’m going to talk to you about Oracle’s Parallel Execution capabilities and hopefully give you a few performance issues to take away and think about in the course of your day-to-day work. Let’s have a look at an outline

Introduction to Parallel Execution Introduction to Parallel Execution Presentation Transcript

  • Tuning & Tracing Parallel Execution (An Introduction) Doug Burns (dougburns@yahoo.com)
  • Introduction
    • Introduction
    • Parallel Architecture
    • Configuration
    • Dictionary Views
    • Tracing and Wait Events
    • Conclusion
    • Parallel Query Option introduced in 7.1
      • Now called Parallel Execution
    • Parallel Execution splits a single large task into multiple smaller tasks which are handled by separate processes running concurrently.
      • Full Table Scans
      • Partition Scans
      • Sorts
      • Index Creation
      • And others …
    Introduction
    • A little history
    • So why did so few sites implement PQO?
    Introduction
      • - Lack of understanding
      • - Leads to horrible early experiences
      • - Community's resistance to change
      • - Not useful in all environments
      • - Needs time and effort applied to the initial design!
    • Isn’t Oracle’s Instance architecture parallel anyway?
    • Non-Parallel Architecture?
    Introduction
  • Parallel Architecture
    • Introduction
    • Parallel Architecture
    • Configuration
    • Dictionary Views
    • Tracing and Wait Events
    • Conclusion
  • Parallel Architecture
    • Non-Parallel
    Parallel Deg 2
    • The Degree of Parallelism (DOP) refers to the number of discrete threads of work
    • The default DOP for an Instance is calculated as
      • cpu_count * parallel_threads_per_cpu
      • Used if I don’t specify a DOP in a hint or table definition
    • The maximum number of PX slaves is :-
      • DOP * 2
      • Plus the Query Coordinator
      • But this is per Data Flow Operation
      • And the slaves will be re-used
    Parallel Architecture
  • Parallel Architecture
    • Inter-process communication is through message buffers (also known as table queues)
    • These can be stored in the shared pool or the large pool
  • Parallel Architecture
    • This slide intentionally left blank
    • Methods of invoking Parallel Execution
      • Table / Index Level
          • ALTER TABLE emp PARALLEL(DEGREE 2);
      • Optimizer Hints
          • SELECT /*+ PARALLEL(emp) */ *
          • FROM emp;
        • Note Using Parallel Execution implies that you will be using the Cost-based Optimiser
        • As usual, appropriate statistics are vital
      • Statement Level
          • ALTER INDEX emp_idx_1 REBUILD
        • PARALLEL 8;
    Parallel Architecture
  • Configuration
    • Introduction
    • Parallel Architecture
    • Configuration
    • Dictionary Views
    • Tracing and Wait Events
    • Conclusion
    • parallel_automatic_tuning
      • First introduced in Oracle 8i
      • This is the first parameter you should set - to TRUE
        • An alternative point of view – don’t use it!
        • Deprecated in 10G and default is FALSE but much of the same functionality is implemented
      • Ensures that message queues are stored in the Large Pool rather than the Shared Pool
      • It modifies the values of other parameters
      • As well as the 10g default values, the following sections show the values when parallel_automatic_tuning is set to TRUE on previous versions
    Configuration
    • parallel_adaptive_multi_user
      • First introduced in Oracle 8
      • Default Value – FALSE (TRUE in 10g)
      • Automatic Tuning Default – TRUE
      • Designed when using PX for online usage
      • As workload increases, new statements will have their degree of parallelism down-graded.
    Configuration
    • Effective Oracle by Design
      • Tom Kyte
      • ‘ This provides the best of both worlds and what users expect from a system. They know that when it is busy, it will run slower.’
    • parallel_max_servers
      • Default - cpu_count * parallel_threads_per_cpu * 2 (if using automatic PGA management) * 5
        • e.g. 1 CPU * 2 * 2 * 5 = 20 on my laptop
      • The maximum number of parallel execution slaves available for all sessions in this instance.
      • Watch out for the processes trap!
    • parallel_min_servers
      • Default - 0
      • May choose to increase this if PX usage is constant to reduce overhead of starting and stopping slave processes.
    Configuration More on this subject in tomorrow’s presentation
    • parallel_execution_message_size
      • Default Value – 2148 bytes
      • Automatic Tuning Default – 4Kb
      • Maximum size of a message buffer
      • May be worth increasing to 8Kb, depending on wait event analysis.
      • However, small increases in message size could lead to large increases in large pool memory requirements
      • Remember that DOP 2 relationship and multiple sessions
    Configuration
    • Metalink Note 201799.1 contains full details and guidance for setting all parameters
    • Ensure that standard parameters are also set appropriately
      • large_pool_size
        • Modified by parallel_automatic_tuning
        • Calculation in Data Warehousing Guide
        • Can be monitored using v$sgastat
      • processes
        • Modified by parallel_automatic_tuning
      • sort_area_size
        • For best results use automatic PGA management
        • Be aware of _smm_px_max_size
    • Metalink Note 201799.1 contains full details and guidance for all relevant parameters
    Configuration
  • Dictionary Views
    • Introduction
    • Parallel Architecture
    • Configuration
    • Dictionary Views
    • Tracing and Wait Events
    • Conclusion
    • Parallel-specific Dictionary Views
      • SELECT table_name
      • FROM dict
      • WHERE table_name LIKE 'V%PQ%' OR table_name like 'V%PX%‘;
      • TABLE_NAME
      • ------------------------------
      • V$PQ_SESSTAT
      • V$PQ_SYSSTAT
      • V$PQ_SLAVE
      • V$PQ_TQSTAT
      • V$PX_BUFFER_ADVICE
      • V$PX_SESSION
      • V$PX_SESSTAT
      • V$PX_PROCESS
      • V$PX_PROCESS_SYSSTAT
      • Also GV$PQ_SESSTAT and GV$PQ_TQSTAT with INST_ID
    Dictionary Views
    • v$pq_sesstat
      • Provides statistics relating to the current session
      • Useful for verifying that a specific query is using parallel execution as expected
    • SELECT * FROM v$pq_sesstat;
      • STATISTIC LAST_QUERY SESSION_TOTAL
      • ------------------------------ ---------- -------------
      • Queries Parallelized 1 1
      • DML Parallelized 0 0
      • DDL Parallelized 0 0
      • DFO Trees 1 1
      • Server Threads 3 0
      • Allocation Height 3 0
      • Allocation Width 1 0
      • Local Msgs Sent 217 217
      • Distr Msgs Sent 0 0
      • Local Msgs Recv'd 217 217
      • Distr Msgs Recv'd 0 0
    Dictionary Views
    • v$pq_sysstat
      • The instance-level overview
      • Various values, including information to help set parallel_min_servers and parallel_max_servers
      • v$px_process_sysstat contains similar information
      • SELECT * FROM v$pq_sysstat WHERE statistic like ‘Servers%’;
      • STATISTIC VALUE
      • ------------------------------ ----------
      • Servers Busy 0
      • Servers Idle 0
      • Servers Highwater 3
      • Server Sessions 3
      • Servers Started 3
      • Servers Shutdown 3
      • Servers Cleaned Up 0
    Dictionary Views
    • v$pq_slave
      • Gives information on the activity of individual PX slaves
      • v$px_process contains similar information
      • SELECT slave_name, status, sessions, msgs_sent_total, msgs_rcvd_total
      • FROM v$pq_slave;
      • SLAV STAT SESSIONS MSGS_SENT_TOTAL MSGS_RCVD_TOTAL
      • ---- ---- ---------- --------------- ---------------
      • P000 BUSY 3 465 508
      • P001 BUSY 3 356 290
      • P002 BUSY 3 153 78
      • P003 BUSY 3 108 63
      • P004 IDLE 2 249 97
      • P005 IDLE 2 246 97
      • P006 IDLE 2 239 95
      • P007 IDLE 2 249 96
    Dictionary Views
    • v$pq_tqstat
      • Shows communication relationship between slaves
      • Must be executed from a session that’s been using parallel operations – refers to this session
      • Example 1 – Attendance Table (25,481 rows)
      • break on dfo_number on tq_id
      • SELECT /*+ PARALLEL (attendance, 4) */ *
      • FROM attendance;
      • SELECT dfo_number, tq_id, server_type, process, num_rows, bytes
      • FROM v$pq_tqstat
      • ORDER BY dfo_number DESC, tq_id, server_type DESC, process;
      • DFO_NUMBER TQ_ID SERVER_TYP PROCESS NUM_ROWS BYTES
      • ---------- ---------- ---------- ---------- ---------- ----------
      • 1 0 Producer P000 6605 114616
      • Producer P001 6102 105653
      • Producer P002 6251 110311
      • Producer P003 6523 113032
      • Consumer QC 25481 443612
    Dictionary Views
    • Example 2 - with a sort operation
      • SELECT /*+ PARALLEL (attendance, 4) */ *
      • FROM attendance
      • ORDER BY amount_paid;
      • DFO_NUMBER TQ_ID SERVER_TYP PROCESS NUM_ROWS BYTES
      • ---------- ---------- ---------- ---------- ---------- ----------
      • 1 0 Ranger QC 372 13322
      • Producer P004 5744 100069
      • Producer P005 6304 110167
      • Producer P006 6303 109696
      • Producer P007 7130 124060
      • Consumer P000 15351 261380
      • Consumer P001 10129 182281
      • Consumer P002 0 103
      • Consumer P003 1 120
      • 1 Producer P000 15351 261317
      • Producer P001 10129 182238
      • Producer P002 0 20
      • Producer P003 1 37
      • Consumer QC 25481 443612
    Dictionary Views
    • So why the unbalanced slaves?
      • Check the list of distinct values in amount_paid
          • SELECT amount_paid, COUNT(*)
          • FROM attendance
          • GROUP BY amount_paid
          • ORDER BY amount_paid
          • /
          •  
          • AMOUNT_PAID COUNT(*)
          • ----------- ----------
          • 200 1
          • 850 1
          • 900 1
          • 1000 7
          • 1150 1
          • 1200 15340
          • 1995 10129
          • 4000 1
    Dictionary Views
    • v$px_session and v$px_sesstat
      • Query to show slaves and physical reads
      • break on qcsid on server_set
      • SELECT stat.qcsid, stat.server_set, stat.server#, nam.name, stat.value
      • FROM v$px_sesstat stat, v$statname nam
      • WHERE stat.statistic# = nam.statistic#
      • AND nam.name = ‘physical reads’
      • ORDER BY 1,2,3
      • QCSID SERVER_SET SERVER# NAME VALUE
      • ---------- ---------- ---------- -------------------- ----------
      • 145 1 1 physical reads 0
      • 2 physical reads 0
      • 3 physical reads 0
      • 2 1 physical reads 63
      • 2 physical reads 56
      • 3 physical reads 61
      • physical reads 4792
    Dictionary Views
    • v$px_process
      • Shows parallel execution slave processes, status and session information
      • SELECT * FROM v$px_process;
      • SERV STATUS PID SPID SID SERIAL#
      • ---- --------- ---------- ------------ ---------- ----------
      • P001 IN USE 18 7680 144 17
      • P004 IN USE 20 7972 146 11
      • P005 IN USE 21 8040 148 25
      • P000 IN USE 16 7628 150 16
      • P006 IN USE 24 8100 151 66
      • P003 IN USE 19 7896 152 30
      • P007 AVAILABLE 25 5804
      • P002 AVAILABLE 12 6772
    Dictionary Views
    • Monitoring the SQL being executed by slaves
      • set pages 0
      • column sql_text format a60
      •  
      • select p.server_name,
      • sql.sql_text
      • from v$px_process p, v$sql sql, v$session s
      • WHERE p.sid = s.sid AND p.serial# = s.serial#
      • AND s.sql_address = sql.address AND s.sql_hash_value = sql.hash_value
      • /
      • 9i Results
      • P001 SELECT A1.C0 C0,A1.C1 C1,A1.C2 C2,A1.C3 C3,A1.C4 C4,A1.C5 C5,
        • A1.C6 C6,A1.C7 C7 FROM :Q3000 A1 ORDER BY A1.C0
      • 10g Results
      • P001 SELECT /*+ PARALLEL (attendance, 2) */ * FROM attendance
      • ORDER BY amount_paid
    Dictionary Views
    • Additional information in standard Dictionary Views
      • e.g. v$sysstat
      • SELECT name, value FROM v$sysstat WHERE name LIKE 'PX%';
      • NAME VALUE
      • ---------------------------------------------- ----------
      • PX local messages sent 4895
      • PX local messages recv'd 4892
      • PX remote messages sent 0
      • PX remote messages recv'd 0
    Dictionary Views
    • Monitoring the adaptive multi-user algorithm
      • We need to be able to check whether operations are being downgraded and by how much
      • Downgraded to serial could be a particular problem!
    • SELECT name, value FROM v$sysstat WHERE name LIKE 'Parallel%'
    • NAME VALUE
    • ---------------------------------------------------------------- ----------
    • Parallel operations not downgraded 546353
    • Parallel operations downgraded to serial 432
    • Parallel operations downgraded 75 to 99 pct 790
    • Parallel operations downgraded 50 to 75 pct 1454
    • Parallel operations downgraded 25 to 50 pct 7654
    • Parallel operations downgraded 1 to 25 pct 11873
    Dictionary Views
    • Monitoring the adaptive multi-user algorithm
      • We need to be able to check whether operations are being downgraded and by how much
      • Downgraded to serial could be a particular problem!
    • SELECT name, value FROM v$sysstat WHERE name LIKE 'Parallel%'
    • NAME VALUE
    • ------------------ ---------------------------------------------- ----------
    • Parallel operations not downgraded 546353
    • P*ssed-off users 432
    • Parallel operations downgraded 75 to 99 pct 790
    • Parallel operations downgraded 50 to 75 pct 1454
    • Parallel operations downgraded 25 to 50 pct 7654
    • Parallel operations downgraded 1 to 25 pct 11873
    • Statspack
      • Example Report (Excerpt)
      • During overnight batch operation
      • Mainly Bitmap Index creation
      • Slightly difficult to read
        • Parallel operations downgraded 1 0
        • Parallel operations downgraded 25 0
        • Parallel operations downgraded 50 7
        • Parallel operations downgraded 75 38
        • Parallel operations downgraded to 1
        • Parallel operations not downgrade 22
      • With one stream downgraded to serial, the rest of the schedule may depend on this one job.
    Dictionary Views
  • Tracing and Wait Events
    • Introduction
    • Parallel Architecture
    • Configuration
    • Dictionary Views
    • Tracing and Wait Events
    • Conclusion
    • Tracing Parallel Execution operations is more complicated than standard tracing
      • One trace file per slave (as well as the query coordinator)
      • Potentially 5 trace files even with a DOP of 2
      • May be in background_dump_dest or user_dump_dest (usually background_dump_dest)
    Tracing and Wait Events
    • Optimizing Oracle Performance
      • Millsap and Holt
      • ‘ The remaining task is to identify and analyze all of the relevant trace files. This task is usually simple …’
    •                                                          
    • Much simpler in 10g
      • Use trcsess to generate a consolidated trace file for QC and all slaves
      • exec dbms_session.set_identifier(‘PX_TEST');
      • REM tracefile_identifier is optional, but might make things easier for you
      • alter session set tracefile_identifier=‘PX_TEST';
      • exec dbms_monitor.client_id_trace_enable(‘PX_TEST');
      • REM DO WORK
      • exec dbms_monitor.client_id_trace_disable(‘PX_TEST’);
      • GENERATE THE CONSOLIDATED TRACE FILE AND THEN RUN IT THROUGH TKPROF
      • trcsess output=/ora/admin/TEST1020/udump/PX_TEST.trc clientid=PX_TEST /ora/admin/TEST1020/udump/*px_test*.trc /ora/admin/TEST1020/bdump/*.trc
      • tkprof /ora/admin/TEST1020/udump/DOUG.trc /ora/admin/TEST1020/udump/DOUG.out
    Tracing and Wait Events
    • This is what one of the slaves looks like
      • C:oracleproduct10.2.0adminORCLudump>cd ../bdump
      • C:oracleproduct10.2.0adminORCLdump>more orcl_ p000 _2748.trc
      • <SNIPPED>
      • *** SERVICE NAME:(SYS$USERS) 2006-03-07 10:57:29.812
      • *** CLIENT ID:(PX_TEST) 2006-03-07 10:57:29.812
      • *** SESSION ID:(151.24) 2006-03-07 10:57:29.812
      • WAIT #0: nam= 'PX Deq: Msg Fragment' ela= 13547 sleeptime/senderid=268566527 passes=1 p3=0 obj#=-1 tim=3408202924
      • =====================
      • PARSING IN CURSOR #1 len=60 dep=1 uid=70 oct=3 lid=70 tim=3408244715 hv=1220056081 ad='6cc64000'
      • select /*+ parallel(test_tab3, 2) */ count(*)
      • from test_tab3
      • END OF STMT
    Tracing and Wait Events
    • Many more wait events and more time spent waiting
      • The various processes need to communicate with each other
      • Metalink Note 191103.1 lists the wait events related to Parallel Execution
      • But be careful of what ‘Idle’ means
    Tracing and Wait Events
    • Events indicating consumers or QC are waiting for data from producers
      • PX Deq: Execute Reply
      • PX Deq: Table Q Normal
    • Although considered idle events, if these waits are excessive, it could indicate a problem in the performance of the slaves
    • Investigate the slave trace files
    •                                                         
    Tracing and Wait Events
    • Events indicating producers are quicker than consumers (or QC)
      • PX qref latch
    • Try increasing parallel_execution_message_size as this might reduce the communications overhead
    • Although it could make things worse if the consumer is just taking time to process the incoming data.
    •                                                         
    Tracing and Wait Events
    • Messaging Events
      • PX Deq Credit: need buffer
      • PX Deq Credit: send blkd
    • Although there may be many waits, the time spent should not be a problem.
    • If it is, perhaps you have an extremely busy server that is struggling to cope
      • Reduce DOP?
      • Increase parallel_execution_message_size?
      • Don’t use PX?
    Tracing and Wait Events
    • Query Coordinator waiting for the slaves to parse their SQL statements
      • PX Deq: Parse Reply
    • If there are any significant waits for this event, this may indicate you have shared pool resource issues.
    • Or you’ve encountered a bug!
    Tracing and Wait Events
    • Partial Message Event
      • PX Deq: Msg Fragment
    • May be eliminated or improved by increasing parallel_execution_message_size
    • Not an issue on recent tests
    Tracing and Wait Events
    • Example
      • Excerpt from an overnight Statspack Report
    •                                                          
      • Event             Waits  Timeouts  Time (s)   (ms)     /txn
      • direct Path read  2,249,666    0    115,813     51     25.5
      • PX Deq: Execute Reply  553,797 22,006     75,910    137      6.3
      • PX qref latch              77,461     39,676     42,257    546      0.9
      • library cache pin          27,877     10,404     31,422   1127      0.3
      • db file scattered read  1,048,135          0     25,144     24     11.9
      • Direct Path Reads
        • Sort I/O
        • Read-ahead
        • PX Slave I/O
        • The average wait time – SAN!
    Tracing and Wait Events
    •                                                      
      • Event             Waits  Timeouts  Time (s)   (ms)     /txn
      • direct Path read  2,249,666    0    115,813     51     25.5
      • PX Deq: Execute Reply  553,797 22,006     75,910    137      6.3
      • PX qref latch              77,461     39,676     42,257    546      0.9
      • library cache pin          27,877     10,404     31,422   1127      0.3
      • db file scattered read  1,048,135          0     25,144     24     11.9
      • PX Deq: Execute Reply
        • Idle event – QC waiting for a response from slaves
        • Some waiting is inevitable
      • PX qref latch
        • Largely down to the extreme use of Parallel Execution
        • Practically unavoidable but perhaps we could increase parallel_execution_message_size?
      • Library cache pin?
        • Need to look at the trace files
    Tracing and Wait Events
  • Conclusion
    • Introduction
    • Parallel Architecture
    • Configuration
    • Dictionary Views
    • Tracing and Wait Events
    • Conclusion
    • Plan / Test / Implement
      • Asking for trouble if you don’t!
    • Hardware
      • It’s designed to suck the server dry
      • Trying to squeeze a quart into a pint pot will make things slow down due to contention
    • Tune the SQL first
      • All the old rules apply
      • The biggest improvements come from doing less unnecessary work in the first place
      • Even if PX does make things go quickly enough, it’s going to use a lot more resources doing so
    Conclusion
    • Don’t use it for small, fast tasks
      • They won’t go much quicker
      • They might go slower
      • They will use more resources
    • Don’t use it for online
      • Not unless it’s a handful of users
      • With a predictable maximum number of concurrent activities
      • Who understand the implications and won’t go crazy when something takes four times as long as normal!
      • It gives a false initial perception of high performance and isn’t scalable
      • Okay, Tom, set parallel_adaptive_multi_user to TRUE
    Conclusion
    • The slower your I/O sub-system, the more benefit you are likely to see from PX
      • But shouldn’t you fix the underlying problem?
      • More on this in the next presentation
    • Consider whether PX is the correct parallel solution for overnight batch operations
      • A single stream of parallel jobs?
      • Parallel streams of single-threaded jobs?
      • Unfortunately you’ll probably have to do some work to prove your ideas!
    Conclusion
  • Tuning & Tracing Parallel Execution (An Introduction) Doug Burns (dougburns@yahoo.com) (oracledoug.blogspot.com) (doug.burns.tripod.com)