Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

How Many Slaves (Ukoug)


Published on

UKOUG version of a presentation trying to establish the sensible limits of parallelism on a couple of hardware configurations. Detailed white paper is at

Published in: Technology
  • Be the first to comment

  • Be the first to like this

How Many Slaves (Ukoug)

  1. 1. How Many Slaves? Parallel Execution and the Magic of 2 Doug Burns [email_address]
  2. 2. Introduction <ul><li>Introduction </li></ul><ul><li>What is the Magic of ‘2’? </li></ul><ul><li>What Tests? </li></ul><ul><li>Test Scripts and Tools </li></ul><ul><li>Test Results </li></ul><ul><li>When is a Conclusion … </li></ul>
  3. 3. Introduction <ul><li>Who (or what) am I ? </li></ul><ul><ul><li>Scottish </li></ul></ul><ul><ul><li>Predominantly a DBA </li></ul></ul><ul><ul><li>Training and Consultancy </li></ul></ul><ul><li>Current Assignment </li></ul><ul><ul><li>BSkyB </li></ul></ul><ul><ul><li>Very Cool Projects and Hardware </li></ul></ul><ul><ul><li>Less Cool Release Management </li></ul></ul><ul><li> </li></ul><ul><ul><li>Blog </li></ul></ul>
  4. 4. <ul><li>Why Parallel Execution? </li></ul><ul><ul><li>Increasing Volumes of Data </li></ul></ul><ul><ul><li>Increasing User Expectations </li></ul></ul><ul><ul><li>More Powerful Hardware </li></ul></ul><ul><li>Parallel Execution (PX) splits a single large task into multiple smaller tasks which are handled by separate processes running concurrently. </li></ul><ul><ul><li>Full Table Scans </li></ul></ul><ul><ul><li>Sorts </li></ul></ul><ul><ul><li>Index Creation, Direct Path inserts etc … </li></ul></ul>Introduction
  5. 5. <ul><li>Previous Paper </li></ul><ul><ul><li>Suck It Dry – Tuning Parallel Execution </li></ul></ul><ul><ul><li> (.doc & .pdf) </li></ul></ul><ul><ul><li>Reviewer comments on parallel_max_servers </li></ul></ul><ul><ul><li>Debate about parallel_adaptive_multi_user </li></ul></ul><ul><ul><li>Something about the Magic of ‘2’ </li></ul></ul><ul><ul><li>Talked about Hardware, but nothing specific </li></ul></ul><ul><li>‘ Sometimes when faced with a slow i/o subsystem you might find that higher degrees of parallelism are useful because the CPUs are spending more time waiting for i/o to complete’ </li></ul>Introduction
  6. 6. <ul><li>Always set customer expectation levels </li></ul><ul><ul><li>I hope you didn’t come here looking for answers! </li></ul></ul><ul><ul><li>Or lots of detail </li></ul></ul><ul><li> (or.doc) </li></ul><ul><ul><li>An interesting story, nonetheless </li></ul></ul><ul><ul><li>A framework for your own tests </li></ul></ul><ul><ul><li>A glance at some results </li></ul></ul>Introduction
  7. 7. Introduction <ul><li>Introduction </li></ul><ul><li>What is the Magic of ‘2’? </li></ul><ul><li>What Tests? </li></ul><ul><li>Test Scripts and Tools </li></ul><ul><li>Test Results </li></ul><ul><li>When is a Conclusion … </li></ul>
  8. 8. <ul><li>Batch Queue Management and the Magic of ‘2’ </li></ul><ul><ul><li>Cary Millsap (2000) - available at </li></ul></ul><ul><li>How many batch processes to execute per CPU? </li></ul><ul><ul><li>2. Well, a range of values really, between 1 and 1.8? </li></ul></ul><ul><ul><li>Most recent work expands on this </li></ul></ul><ul><ul><ul><li>CPU-intensive batch jobs per CPU <2 (nearer to 1) </li></ul></ul></ul><ul><ul><ul><li>I/O-intensive batch jobs per CPU >2 </li></ul></ul></ul><ul><ul><ul><li>CPU and I/O request durations are exactly equal (rare) - CPU * 2 </li></ul></ul></ul><ul><ul><li>Misconfiguration could change everything </li></ul></ul><ul><ul><li>What is a batch job anyway? </li></ul></ul>What is The Magic of ‘2’?
  9. 9. What is The Magic of ‘2’? <ul><li>Oracle 10.2 Docs mention the Magic of ‘2’ </li></ul><ul><li>PARALLEL_THREADS_PER_CPU enables you to adjust for hardware configurations with I/O subsystems that are slow relative to the CPU speed and for application workloads that perform few computations relative to the amount of data involved . </li></ul><ul><li>If the system is neither CPU-bound nor I/O-bound, then the PARALLEL_THREADS_PER_CPU value should be increased. This increases the default DOP and allow better utilization of hardware resources. </li></ul><ul><li>The default for PARALLEL_THREADS_PER_CPU on most platforms is two . However, the default for machines with relatively slow I/O subsystems can be as high as eight . </li></ul>
  10. 10. What Tests? <ul><li>Introduction </li></ul><ul><li>What is the Magic of ‘2’? </li></ul><ul><li>What Tests? </li></ul><ul><li>Test Scripts and Tools </li></ul><ul><li>Test Results </li></ul><ul><li>When is a Conclusion … </li></ul>
  11. 11. <ul><li>What should I test? </li></ul><ul><ul><li>Parallel operations (obviously) </li></ul></ul><ul><ul><li>Multiple CPUs </li></ul></ul><ul><ul><li>I/O infrastructure </li></ul></ul><ul><li>Operating System – Unix / Linux </li></ul><ul><ul><li>Free (as in beer) </li></ul></ul><ul><ul><li>Cross-platform </li></ul></ul><ul><ul><li>Tools and Utilities </li></ul></ul><ul><li>Oracle Version – 10.2 </li></ul><ul><ul><li>The latest and greatest, or common and well-known? </li></ul></ul><ul><ul><li>Boy, that was a good choice. </li></ul></ul><ul><li>Workloads – Keep it simple </li></ul><ul><ul><li>Data! </li></ul></ul><ul><ul><li>CPU vs I/O balance </li></ul></ul>What Tests?
  12. 12. <ul><li>First attempt </li></ul><ul><ul><li>Full Table scan of a 2 million row table </li></ul></ul><ul><ul><ul><li>PCTFREE 90 expanded it to 2.8Gb </li></ul></ul></ul><ul><ul><ul><li>Small enough for all platforms </li></ul></ul></ul><ul><ul><ul><li>Big enough to exercise the I/O subsystem properly </li></ul></ul></ul><ul><ul><ul><li>NOT! EMC took 7 seconds. </li></ul></ul></ul><ul><li>Second attempt </li></ul><ul><ul><li>Full Table scan of 8 million row table </li></ul></ul><ul><ul><ul><li>PCTFREE 90 expanded it to 10Gb </li></ul></ul></ul><ul><ul><ul><li>Too big for the little PC now! (Used 1/8 of the data) </li></ul></ul></ul><ul><ul><ul><li>Solved most problems </li></ul></ul></ul><ul><ul><ul><li>But too I/O intensive (More on this later) </li></ul></ul></ul>What Tests?
  13. 13. <ul><li>Third attempt </li></ul><ul><ul><li>FTS plus a Hash Join and Sort of two 8 million row tables </li></ul></ul><ul><ul><ul><li>PCTFREE 90 expanded them to over 10Gb </li></ul></ul></ul><ul><ul><ul><li>Unsuitable for the PC, used 1/8 data again </li></ul></ul></ul><ul><ul><ul><li>Started to produce more interesting results </li></ul></ul></ul><ul><li>Multi-user tests </li></ul><ul><ul><li>More on these later </li></ul></ul><ul><ul><ul><li>8 new 1 million row tables </li></ul></ul></ul><ul><ul><ul><li>PCTFREE 90 expanded them to 147Mb each </li></ul></ul></ul>What Tests?
  14. 14. What Tests? <ul><li>The Test Process will be much easier if you have </li></ul><ul><ul><li>Enough Time </li></ul></ul><ul><ul><li>Appropriate Hardware </li></ul></ul><ul><ul><li>A Dedicated Assistant </li></ul></ul><ul><ul><li>A Pleasant Working Environment </li></ul></ul><ul><li>Two out of Four ain’t bad … </li></ul>
  15. 15. <ul><li>Intel Single-CPU PC – Tulip PC </li></ul><ul><ul><li>White Box Linux – Kernel 2.6.9 </li></ul></ul><ul><ul><li>1 x 550Mhz Pentium 3 </li></ul></ul><ul><ul><li>768Mb RAM </li></ul></ul><ul><ul><li>Single 20Gb IDE </li></ul></ul><ul><li>Intel SMP Server – Intel ISP4400 (SRKA4) </li></ul><ul><ul><li>White Box Linux – Kernel 2.6.9 </li></ul></ul><ul><ul><li>4 x 700Mhz Pentium 3 Xeon </li></ul></ul><ul><ul><li>3.5Gb RAM </li></ul></ul><ul><ul><li>4 x Seagate Cheetah U-160 SCSI </li></ul></ul><ul><ul><ul><li>Software RAID-0 (256Kb stripe) </li></ul></ul></ul><ul><ul><li>Separate system/software disk </li></ul></ul><ul><ul><li>Enable/Disable CPUs by editing grub.conf </li></ul></ul><ul><ul><li>£300 on eBay including all HDD and shipping </li></ul></ul>What Tests?
  16. 16. <ul><li>Enterprise SMP server – Sun E10K </li></ul><ul><ul><li>Solaris 8 </li></ul></ul><ul><ul><li>12 x 400Mhz SPARC </li></ul></ul><ul><ul><li>12Gb RAM </li></ul></ul><ul><ul><li>EMC Symmetrix 8730 via Brocade SAN </li></ul></ul><ul><ul><li>5 x Hard Disk Slices (Hypers) in RAID 1+0 (960Kb stripe) </li></ul></ul><ul><ul><li>Enable/Disable CPUs using psradm </li></ul></ul><ul><li>Yes, really ! </li></ul><ul><ul><li>We had some spare kit kicking around. (Thanks, Mike) </li></ul></ul><ul><li>DBA Lessons </li></ul><ul><ul><li>#1 - Always be nice to System and Storage Administrators </li></ul></ul><ul><ul><li>#2 – Work for companies with a lot of money </li></ul></ul>What Tests?
  17. 17. Test Scripts and Tools <ul><li>Introduction </li></ul><ul><li>What is the Magic of ‘2’? </li></ul><ul><li>What Tests? </li></ul><ul><li>Test Scripts and Tools </li></ul><ul><li>Test Results </li></ul><ul><li>When is a Conclusion … </li></ul>
  18. 18. Test Scripts and Tools <ul><li>init.ora </li></ul><ul><ul><li>Disabled parallel_adaptive_multi_user </li></ul></ul><ul><ul><li>Set parallel_max_servers to 512 </li></ul></ul><ul><ul><ul><li>I forgot to increase this a couple of times </li></ul></ul></ul><ul><ul><li>A stupid mistake in the paper (and an important lesson) </li></ul></ul><ul><ul><ul><li>Parallel_max_servers=512 keeps defaulting to 385? </li></ul></ul></ul><ul><ul><ul><li>processes=400 ! </li></ul></ul></ul><ul><li>Setup scripts </li></ul><ul><ul><li>To be able to recreate environment easily </li></ul></ul><ul><ul><li>setup1.sql – Tablespaces, user account and privs </li></ul></ul><ul><ul><li>setup2.sql – Create two 8 million row / 11Gb tables </li></ul></ul><ul><ul><li>setup3.sql – Create eight 1 million row / 147Mb tables. </li></ul></ul>
  19. 19. Test Scripts and Tools <ul><li>Test scripts </li></ul><ul><ul><li>To run selected SQL statements consistently across a range of DOPs, unattended. </li></ul></ul><ul><ul><li> – FTS and HJ/Sort against the big tables </li></ul></ul><ul><ul><li> – HJ/Sort of one big table and one of the smaller tables, accepting a session parameter so that multiple copies can run concurrently </li></ul></ul><ul><ul><li> – Harness script that runs for a given number of users </li></ul></ul>
  20. 20. Test Scripts and Tools <ul><li>Information Collection </li></ul><ul><ul><li>Simple log file </li></ul></ul><ul><ul><ul><li>SQL statements </li></ul></ul></ul><ul><ul><ul><li>Output </li></ul></ul></ul><ul><ul><ul><li>Timings </li></ul></ul></ul><ul><ul><ul><li>Autotrace </li></ul></ul></ul><ul><ul><ul><li>v$pq_tqstat query after each statement </li></ul></ul></ul><ul><ul><li>10046 Trace File </li></ul></ul><ul><ul><ul><li>Consolidated version, using client_id and trcsess </li></ul></ul></ul><ul><ul><ul><li>tkprof output too </li></ul></ul></ul><ul><ul><ul><li>Watch the overhead in disk space and trcsess run time! </li></ul></ul></ul><ul><ul><li>System Statistics </li></ul></ul>
  21. 21. Test Scripts and Tools <ul><li>Operating System Statistics </li></ul><ul><ul><li>Resource Usage </li></ul></ul><ul><ul><li>Bottlenecks </li></ul></ul><ul><ul><li>Long-running tests – likely to be a lot of data! </li></ul></ul><ul><li>ORCA/orcallator </li></ul><ul><ul><li> </li></ul></ul><ul><ul><li>Go for the latest development tarball, which includes </li></ul></ul><ul><ul><ul><li>procallator for Linux statistics collection </li></ul></ul></ul><ul><ul><li>Easy configuration to generate HTML output </li></ul></ul><ul><ul><li>Pretty graphs! </li></ul></ul><ul><ul><li>Lots of them in the paper, but not here. </li></ul></ul>
  22. 22. Test Results <ul><li>Introduction </li></ul><ul><li>What is the Magic of ‘2’? </li></ul><ul><li>What Tests? </li></ul><ul><li>Test Scripts and Tools </li></ul><ul><li>Test Results </li></ul><ul><li>When is a Conclusion … </li></ul>
  23. 23. PC – 1 CPU – 1.3Gb
  24. 24. ISP4400 – 1-4 CPUs – FTS 11Gb
  25. 25. ISP4400 – 1-4 CPUs – HJ 22Gb
  26. 26. E10K – 1-12 CPUs – FTS 11Gb
  27. 27. E10K – 1-12 CPUs – HJ 22Gb
  28. 28. Multi-user Tests <ul><li>First attempt </li></ul><ul><ul><li>Hash Join/Sort statement only </li></ul></ul><ul><ul><li>170Mb Tables – 128,000 rows (PCTFREE 90) </li></ul></ul><ul><ul><li>Between 1 and 12 concurrent users, noparallel to DOP 4 </li></ul></ul><ul><ul><li>Showed how quickly PX response drops off with multiple users </li></ul></ul><ul><ul><li>Then I noticed something strange in the V$PQ_TQSTAT output </li></ul></ul><ul><ul><li>Slaves weren’t doing much work. </li></ul></ul><ul><li>What’s that sound I can hear? </li></ul><ul><ul><li>PCTFREE 90 - lots of disk I/O (largely empty blocks) </li></ul></ul><ul><ul><li>Very small data volumes feeding into later stages of the plan! </li></ul></ul><ul><ul><li>Mmmmm …. Perhaps that doesn’t test the CPUs too well </li></ul></ul>
  29. 29. Multi-user Tests
  30. 30. Doh! <ul><li>If the CPUs weren’t working hard enough on the multi-user tests, then … </li></ul><ul><li>I should re-run the Single User/Volume Tests </li></ul>
  31. 31. Single User Volume Tests II
  32. 32. When is a Conclusion … <ul><li>Introduction </li></ul><ul><li>What is the Magic of ‘2’? </li></ul><ul><li>What Tests? </li></ul><ul><li>Test Scripts and Tools </li></ul><ul><li>Test Results </li></ul><ul><li>When is a Conclusion … </li></ul>
  33. 33. <ul><li>… not a Conclusion? </li></ul><ul><ul><li>When it contains lots of mights, maybes and coulds? </li></ul></ul><ul><ul><li>When you’ve been testing the wrong thing? </li></ul></ul><ul><li>IF you’re the only user of the server and it has more than one CPU and enough disks then </li></ul><ul><ul><li>You should definitely give PX at a DOP of 2 a try </li></ul></ul><ul><ul><li>Benefit from the direct path I/O, not the parallelism? </li></ul></ul><ul><ul><ul><li>_serial_direct_read=true </li></ul></ul></ul><ul><li>Benefits diminish rapidly </li></ul><ul><ul><li>If using an unsuitable disk configuration, like these tests </li></ul></ul><ul><ul><li>Then again, I think a lot of people are </li></ul></ul>When is a Conclusion …
  34. 34. <ul><li>The only way to know for sure is to test your SQL, with your data with a range of DOPs </li></ul><ul><ul><li>Then choose something below the apparent optimum? </li></ul></ul><ul><li>Parallel Execution loves hardware </li></ul><ul><ul><li>But it’s not just about having loads of kit </li></ul></ul><ul><ul><li>You need to have the right balance of CPU, Memory and I/O bandwidth </li></ul></ul><ul><ul><li>Bottlenecks will become apparent more quickly </li></ul></ul><ul><li>Don’t use it for online </li></ul><ul><ul><li>Unless it’s a handful of users </li></ul></ul><ul><ul><li>With a predictable maximum number of concurrent activities </li></ul></ul><ul><ul><li>Set parallel_adaptive_multi_user to TRUE? (10g default) </li></ul></ul><ul><ul><li>You must explain it to your users! </li></ul></ul>When is a Conclusion …
  35. 35. <ul><li>More things to try </li></ul><ul><ul><li>Bigger stripe widths and filesystem options ( DONE ) </li></ul></ul><ul><ul><li>Different extent and block sizes ( DONE ) </li></ul></ul><ul><ul><li>Disk-separated data files and Hash Partitioned Tables </li></ul></ul><ul><ul><li>Hardware RAID </li></ul></ul><ul><ul><li>Different Automatic PGA Management Settings </li></ul></ul><ul><ul><li>Oracle’s Default PX Parameter Values </li></ul></ul><ul><ul><li>Different SQL </li></ul></ul><ul><li>What have I started ?!? </li></ul><ul><ul><li>What price an old EMC Symmetrix on eBay? </li></ul></ul><ul><ul><li>Do you think Scottish Power do 3-phase power for domestic customers? </li></ul></ul><ul><ul><li>How will I explain the noise to Housemates and Partner! </li></ul></ul>When is a Conclusion …
  36. 36. <ul><li>The scripts are there </li></ul><ul><ul><li> </li></ul></ul><ul><ul><li>Tailor them to your needs. Improve them! </li></ul></ul><ul><ul><li>Let me know your results – I’m interested. </li></ul></ul><ul><ul><li>Including details of your environment </li></ul></ul><ul><ul><li>Data creation scripts </li></ul></ul><ul><ul><li>Your SQL </li></ul></ul>When is a Conclusion …
  37. 37. How Many Slaves? Parallel Execution and the Magic of 2 Doug Burns [email_address]