• Like
  • Save
Profiling and Optimizing for Xeon Phi with Allinea MAP
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

Profiling and Optimizing for Xeon Phi with Allinea MAP

  • 86 views
Published

Discovering bottlenecks without pain with Allinea MAP and Xeon Phi

Discovering bottlenecks without pain with Allinea MAP and Xeon Phi

Published in Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
86
On SlideShare
0
From Embeds
0
Number of Embeds
1

Actions

Shares
Downloads
0
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Profiling and optimizing for Xeon Phiwith Allinea MAPDiscovering bottlenecks without pain
  • 2. What is happening ?Single Core Era Multi-Core Era Many-Core EraConstraints :-Power-Complexity of algorithmsConstraints :-Power-Parallel software availability-ScalabilityConstraints :-Programming modelsPerformanceTime(years)
  • 3. • Parallel profiler designed for:‒ C/C++, Fortran‒ MPI code Interdependent or independent processes‒ Multithreaded code Monitor the main threads for each process‒ Accelerated codes GPUs, Intel Xeon Phi• Improve productivity :‒ Helps you detect performance issues quickly and easily‒ Tells you immediately where your time is spent in your source code‒ Helps you to optimize your application efficientlyAllinea MAPIncrease application performance
  • 4. • Support for I/O metrics‒ I/O can be a major bottleneck in HPC systems‒ Find the optimal configuration for your file system.Benefit : Broader profiling and analysis capabilities tosolve even more performance issues.• Support for Intel Xeon Phi‒ Already supported on Allinea DDT‒ Officially extended to profilingBenefit : Ensure you are getting the best performancefrom new technology.Allinea MAP 4.1New features at ISC 2013
  • 5. Intel Xeon Phi and Allinea• Started architecture and tools discussions with Intel• Early development prototypes exchanged2011• Full debugger support for Intel MIC architecture• Official 3.2 release• Feedback from early adopters2012• Profiling support for Intel Xeon Phi announced• #1 Green 500 system, Xeon Phi-powered Beacon choosesAllinea• Dramatic surge in interest in debugging and profiling onXeon Phi2013
  • 6. Optimizing for the Xeon PhiWhere do you start?“Code that’s well-optimized for the hostusually performs pretty well on the cards”- Pretty much everyone
  • 7. Optimizing for the Xeon PhiBut what matters?VectorizationOtherstuffPerformance
  • 8. Optimizing for the Xeon PhiIs my code well-vectorized?… maybe?
  • 9. Optimizing for the Xeon PhiIs my code well-vectorized?… maybe?
  • 10. Optimizing for the Xeon PhiIs my code well-vectorized?… maybe?Not in this loop(16.5% of total time)
  • 11. Optimizing for the Xeon PhiNon-obvious tradeoffs
  • 12. Optimizing for the Xeon PhiNon-obvious tradeoffsHere a loop taking55% of total runtimeisn’t vectorized at allTaking the unvectorizable rand() out of the loopallows the sqrt workload to be fully-vectorized –reverse loop fusion!
  • 13. Optimizing for the Xeon PhiNon-obvious tradeoffsNow the floating-point workload isfully-vectorizedBut all the time is being spent in the randomnumber generation, so that’s what really needs tobe optimized
  • 14. Optimizing for the Xeon PhiKnow your toolsReplace rand() with Intel’s vectorized version and re-fuse the loopto retain temporal cache locality benefits
  • 15. Optimizing for the Xeon PhiThe full pictureYou need to see the full picture to spot thesetradeoffs – Allinea MAP shows you the way
  • 16. Optimizing for the Xeon PhiRunning on the cardAllinea MAP runs with full metrics on Xeon Phi cards!
  • 17. Optimizing for the Xeon PhiRunning on the cardThis makes it easy to compare and learn versus the host
  • 18. • Full, graphical debugger designed for :‒ C/C++, Fortran, Xeon Phi, UPC, …‒ MPI, OpenMP and mixed-mode code• Unified interface with Allinea MAP :‒ Just what you need when you’ve addedOpenMP and now everything segfaults!‒ One interface eliminates learning curve‒ Spend more time on your results• Slash your time to develop :‒ Reproduces and triggers your bugs instantly‒ Helps you easily understand where issues come from quickly‒ Helps you to fix them as swiftly as possibleAllinea DDTUnified interface for debugging
  • 19. • Ten years of high-quality development tools‒ Leading in HPC software tools market worldwide‒ Global customer base• Making parallel programming accessible to the widestrange of scientists and programmers‒ Design an unrivaled productive and easy-to-use development environment…‒ … To help you reach the highest level of performance and scalability‒ Define a new standard of customer supportAllinea Software
  • 20. Summary• Allinea’s tools are the premier Xeon Phi developmentenvironment– See at a glance which loops to vectorized and which toignore– Full profiling metrics available on the Xeon Phi cards– Unified interface with Allinea DDT keeps you productive,whatever you’re working onTo learn more, visit us at ourbooth #655 !
  • 21. Thank youYour contacts :– Technical Support team : support@allinea.com– Sales team : sales@allinea.com