Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Profiling and Optimizing for Xeon Phi with Allinea MAP


Published on

Discovering bottlenecks without pain with Allinea MAP and Xeon Phi

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Profiling and Optimizing for Xeon Phi with Allinea MAP

  1. 1. Profiling and optimizing for Xeon Phiwith Allinea MAPDiscovering bottlenecks without pain
  2. 2. What is happening ?Single Core Era Multi-Core Era Many-Core EraConstraints :-Power-Complexity of algorithmsConstraints :-Power-Parallel software availability-ScalabilityConstraints :-Programming modelsPerformanceTime(years)
  3. 3. • Parallel profiler designed for:‒ C/C++, Fortran‒ MPI code Interdependent or independent processes‒ Multithreaded code Monitor the main threads for each process‒ Accelerated codes GPUs, Intel Xeon Phi• Improve productivity :‒ Helps you detect performance issues quickly and easily‒ Tells you immediately where your time is spent in your source code‒ Helps you to optimize your application efficientlyAllinea MAPIncrease application performance
  4. 4. • Support for I/O metrics‒ I/O can be a major bottleneck in HPC systems‒ Find the optimal configuration for your file system.Benefit : Broader profiling and analysis capabilities tosolve even more performance issues.• Support for Intel Xeon Phi‒ Already supported on Allinea DDT‒ Officially extended to profilingBenefit : Ensure you are getting the best performancefrom new technology.Allinea MAP 4.1New features at ISC 2013
  5. 5. Intel Xeon Phi and Allinea• Started architecture and tools discussions with Intel• Early development prototypes exchanged2011• Full debugger support for Intel MIC architecture• Official 3.2 release• Feedback from early adopters2012• Profiling support for Intel Xeon Phi announced• #1 Green 500 system, Xeon Phi-powered Beacon choosesAllinea• Dramatic surge in interest in debugging and profiling onXeon Phi2013
  6. 6. Optimizing for the Xeon PhiWhere do you start?“Code that’s well-optimized for the hostusually performs pretty well on the cards”- Pretty much everyone
  7. 7. Optimizing for the Xeon PhiBut what matters?VectorizationOtherstuffPerformance
  8. 8. Optimizing for the Xeon PhiIs my code well-vectorized?… maybe?
  9. 9. Optimizing for the Xeon PhiIs my code well-vectorized?… maybe?
  10. 10. Optimizing for the Xeon PhiIs my code well-vectorized?… maybe?Not in this loop(16.5% of total time)
  11. 11. Optimizing for the Xeon PhiNon-obvious tradeoffs
  12. 12. Optimizing for the Xeon PhiNon-obvious tradeoffsHere a loop taking55% of total runtimeisn’t vectorized at allTaking the unvectorizable rand() out of the loopallows the sqrt workload to be fully-vectorized –reverse loop fusion!
  13. 13. Optimizing for the Xeon PhiNon-obvious tradeoffsNow the floating-point workload isfully-vectorizedBut all the time is being spent in the randomnumber generation, so that’s what really needs tobe optimized
  14. 14. Optimizing for the Xeon PhiKnow your toolsReplace rand() with Intel’s vectorized version and re-fuse the loopto retain temporal cache locality benefits
  15. 15. Optimizing for the Xeon PhiThe full pictureYou need to see the full picture to spot thesetradeoffs – Allinea MAP shows you the way
  16. 16. Optimizing for the Xeon PhiRunning on the cardAllinea MAP runs with full metrics on Xeon Phi cards!
  17. 17. Optimizing for the Xeon PhiRunning on the cardThis makes it easy to compare and learn versus the host
  18. 18. • Full, graphical debugger designed for :‒ C/C++, Fortran, Xeon Phi, UPC, …‒ MPI, OpenMP and mixed-mode code• Unified interface with Allinea MAP :‒ Just what you need when you’ve addedOpenMP and now everything segfaults!‒ One interface eliminates learning curve‒ Spend more time on your results• Slash your time to develop :‒ Reproduces and triggers your bugs instantly‒ Helps you easily understand where issues come from quickly‒ Helps you to fix them as swiftly as possibleAllinea DDTUnified interface for debugging
  19. 19. • Ten years of high-quality development tools‒ Leading in HPC software tools market worldwide‒ Global customer base• Making parallel programming accessible to the widestrange of scientists and programmers‒ Design an unrivaled productive and easy-to-use development environment…‒ … To help you reach the highest level of performance and scalability‒ Define a new standard of customer supportAllinea Software
  20. 20. Summary• Allinea’s tools are the premier Xeon Phi developmentenvironment– See at a glance which loops to vectorized and which toignore– Full profiling metrics available on the Xeon Phi cards– Unified interface with Allinea DDT keeps you productive,whatever you’re working onTo learn more, visit us at ourbooth #655 !
  21. 21. Thank youYour contacts :– Technical Support team :– Sales team :