Published on

Published in: Technology
  • Be the first to comment


  1. 1. Utilizing MulticoreProcessors with OpenMP Pete Isensee, Microsoft Corporation Represented by Eric Cheng, CGGT 12
  2. 2. Why do we need to utilizing multicore• CPU clock speeds are not improving at the rates we’ve become accustomed over the past decade• Games now days are more complex, with more objects in scenes than ever• We have to leverage the power of multicore processors in the game engines
  3. 3. OpenMP• The industry often needs to schedule dozens and sometimes hundreds of independent processors that have a shared view of memory• OpenMP is one of the solution• OpenMP is a portable, industry-standard API and programming protocol for C/C++ that supports parallel programming
  4. 4. OpenMP Support• OpenMP is managed by the non-profit technology consortium OpenMP Architecture Review Board (or OpenMP ARB), jointly defined by a group of major computer hardware and software vendors, like AMD, IBM, Intel, Cray, HP, Fujitsu, NVIDIA, NEC, Microsoft, Texas Instruments,VMware, Oracle Corporation, and more.• The OpenMP Architecture Review Board (ARB) published its first API specifications, OpenMP for Fortran 1.0, in October 1997. October the following year they released the C/C++ standard. 2000 saw version 2.0 of the Fortran specifications with version 2.0 of the C/C++ specifications being released in 2002. Version 2.5 is a combined C/C++/Fortran specification that was released in 2005.• Version 3.0, released in May, 2008. Included in the new features in 3.0 is the concept of tasks and the task construct. These new features are summarized in Appendix F of the OpenMP 3.0 specifications.• Version 3.1 of the OpenMP specification was released July 9, 2011.
  5. 5. OpenMP Implementation• OpenMP has been implemented in many commercial compilers. For instance,Visual C++ 2005, 2008 and 2010 support it (in their Professional, Team System, Premium and Ultimate editions), as well as Intel Parallel Studio for various processors. Sun Studio compilers and tools support the latest OpenMP specifications with productivity enhancements for Solaris OS (UltraSPARC and x86/x64) and Linux platforms. The Fortran, C and C++ compilers from The Portland Group also support OpenMP 2.5. GCC has also supported OpenMP since version 4.2.• A few compilers have early implementation for OpenMP 3.0, including• GCC 4.3.1, Nanos compiler, Intel Fortran and C/C++ versions 11.0 and 11.1 Compilers, Intel C/C++ and Fortran Composer XE 2011 and Intel Parallel Studio, IBM XL C/C++ Compiler• Sun Studio 12 update 1 has a full implementation of OpenMP 3.0
  6. 6. OpenMP Example: Particle System• Suppose your game has a particle system that recalculates the positions of all particles once per frame:• Say if we have two threads, we want to split up like this:• OpenMP allows you to do that with this:
  7. 7. What compiler does here• An OpenMP-enabled compiler generates code that automatically splits this for loop into multiple parallel sections, each of which executes independently. The number of parallel sections used depends on the hardware, the OpenMP runtime, and configuration settings that the programmer has established
  8. 8. Performance• Suppose that numParticles is 100,000 and GetNewParticlePos takes a small fixed amount of time• The table below shows the performance metrics on two different systems using the Visual Studio 2005 compiler. The first system is a desktop PC with dual Xeon processors, each with two hyperthreads. The second system is an Xbox 360, which has three CPUs, each with two hardware threads Windows Threads Hardware OpenMP Threads OpenMP Perf Gain Perf GainDual-core 2.3-GHz 4 2.9x 2.9x Pentium Triple-core 3.2- 6 4.6x 4.5x GHz Xbox 360
  9. 9. Comparison with Windows thread calls• Writing the previous example with Windows thread calls and synchronization primitives required over 60 lines of code, not to mention considerable debugging and tuning effects• The performance gain was virtually the same using OpenMP directly. In fact, on Xbox 360, the overhead of calling Windows synchronization primitives was higher than using OpenMP, because the OpenMP runtime is turned to call kernel exports directly• From the example, we learn from the fact that OpenMP can provide major benefits with a very small investment
  10. 10. OpenMP Example: Collision Detection• Dynamic scheduling is used to tell the compiler to schedule the thread team at runtime rather than simply dividing the iterations evenly between the threads
  11. 11. Performance • Suppose that numParticles is 1000, SphereIntersect takes a small, fixed amount of time, and ObjectIntersect takes 100 times as long as SphereIntersect. Also assume a 10% sphere intersection rate and 1% object intersection rate • OpenMP typically adds very little runtime overhead in terms of either size or speed. The Visual Studio OpenMP DLL is only 60k. Hardware OpenMP Threads OpenMP Perf Gain Dual-core 2.3-GHz 4 3.3x PentiumTriple-core 3.2-GHz Xbox 6 5.4x 360
  12. 12. Function Parallelism• Apart from data parallelism, OpenMP can also be used for function-level parallelism. Consider QuickSort• Each of the recursive calls to qsort is completely independent of each other
  13. 13. Function Parallelism• One way is to split the independent recursive calls into their own OpenMP parallel sections• To execute the calls in parallel, they’re wrapped in braces so that OpenMP knows what portion to run in parallel. On most platforms, however, the overhead of the recursive parallel sections is likely to outweigh the benefits
  14. 14. A better solution• Calculate a handful of partitions, then do a high-level parallelization on the resulting partitions. Given a decent partition function, each partition will be roughly the same size, and the resulting performance gains can be worth the effort
  15. 15. Performance • Given an array of random integers of 1,000,000 Hardware OpenMP Threads OpenMP Perf Gain Dual-core 2.3-GHz 4 1.5x PentiumTriple-core 3.2-GHz Xbox 6 1.4x 360
  16. 16. OpenMP flaws• OpenMP is not designed for all multithreading issues. Complex multithreading scenarios and synchronization are best done using native thread techniques. If you’re writing a new engine, using OpenMP is not the way to go.• OpenMP-enabled compilers typically don’t check to ensure that your code will parallelize correctly.
  17. 17. OpenMP flaws• Most compilers will compile the above code without errors. The problem is that the second and third sections rely on the top section first completing with a valid value for n. Very likely the program will crash. It’s up to the programmer to ensure that OpenMP is only applied to constructs that are not order-dependent.• Another gotcha is debugging. When the compiler encounters an OpenMP block, it generates custom code that called into the OpenMP runtime. Unfortunately, the internals of OpenMP are a black box. This can make debugging very difficult.• Finally, using OpenMP does not guarantee that you will improve performance on multiprocessor systems. Depending on your usage, the runtime overhead of OpenMP can dwarf any benefits. For instance, when numParticles in the first example was on the order of 100, performance gain was negligible.
  18. 18. Conclusion• OpenMP is a quick and useful technique for utilizing multicore and hyperthreaded processors. It’s easy enough to be used for last-minute optimizations, yet flexible enough to use in cross-platform code. It’s not without flaws; but used properly, OpenMP can be very beneficial• Potential applications of OpenMP in games include particle systems, skinning, collision detection, simulations, pathfinding, vertex transforms, signal processing, procedural synthesize, and fractals• Even though OpenMP has been around for quite a while, it’s a new technology for most game developers. The best resource is the OpenMP specification, which is available at The specification is concise and very readable.
  19. 19. Q&A
  20. 20. Thank you!