Your SlideShare is downloading. ×
  • Like
HPC Performance & Development Tuning tools for scientists to go parallel faster with allinea
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Now you can save presentations on your phone or tablet

Available for both IPhone and Android

Text the download link to your phone

Standard text messaging rates apply

HPC Performance & Development Tuning tools for scientists to go parallel faster with allinea

  • 155 views
Published

 

Published in Technology , Business
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
155
On SlideShare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
4
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide
  • It’s not often that marketing lives up to its hype, but something we’ve consistently heard from users around the world porting their codes to Xeon Phi is that – once they’ve done a good job of optimizing for the host – the performance on the Phi is normally pretty good right away.
  • The reason is that even on a standard Xeon these days, you need to take advantage of vectorized instructions to get good performance. With 512-bit registers, vectorization is absolutely critical to achieving good performance on the Xeon Phi. There’s no point in sending all the cars down one lane of the highway!
  • The Intel compilers can give very detailed reports about what they’re doing to each loop using the –vec-report flags, but even on a small program you need to know which loops are worth spending your time on and which you can ignore.
  • Allinea MAP shows you the behavior of your code at a single glance – let me briefly walk you through the interface here. <talk about how to interpret the metric graphs and the sparkline graphs next to the code viewer. Finish by pointing out that the CPU floating-point vector graph is at 0 for the selected region of time!
  • This is Allinea MAP’s answer to our question – there’s an important loop taking 16.5% of the total program time that isn’t vectorizing at all! Now we know which lines of code are affected, we can ask the compiler for a report and investigate further.
  • It’s not just profiling that works the same – our unified interface is shared with Allinea DDT, a full-featured debugger supporting a huge range of platforms and codes including the Xeon Phi.
  • You can’t achieve full performance by looking through a microscope all the time – you have to be able to step back from the quest to vectorized the next loop, and the next, and ask “is this worth it? Is there a library I can use here? Can I refactor my code here?” MAP gives you the oversight and insight you need to answer these questions.
  • You can’t achieve full performance by looking through a microscope all the time – you have to be able to step back from the quest to vectorized the next loop, and the next, and ask “is this worth it? Is there a library I can use here? Can I refactor my code here?” MAP gives you the oversight and insight you need to answer these questions.
  • You can’t achieve full performance by looking through a microscope all the time – you have to be able to step back from the quest to vectorized the next loop, and the next, and ask “is this worth it? Is there a library I can use here? Can I refactor my code here?” MAP gives you the oversight and insight you need to answer these questions.
  • And when you come to run your code on the card, Allinea MAP gathers exactly the same information and displays it in exactly the same way

Transcript

  • 1. Get Performance on Intel® Xeon Phi™ with Allinea MAP and Allinea DDT Discovering bottlenecks without pain
  • 2. In my Parallel Universe… … we develop new antibiotics faster than bacteria develop resistance ... every household can prototype and evolve their own 3D-printed designs … accurate simulation of the natural world is taken for granted
  • 3. So I decided to… … create parallel development tools for scientists: We’re accelerating the pace of scientific progress
  • 4. HPC on the critical path to progress Single Core Era Multi-Core Era Many-Core Era Constraints : Constraints : Constraints : -Power -Power -Parallel software availability -Scalability -Programming models Performance -Complexity of algorithms Time(years)
  • 5. Allinea MAP Increase application performance • Parallel profiler designed for: ‒ C/C++, Fortran ‒ MPI code  Interdependent or independent processes ‒ Multithreaded code  Monitor the main threads for each process ‒ Accelerated codes  GPUs, Intel® Xeon Phi™ • Improve productivity : ‒ Helps you detect performance issues quickly and easily ‒ Tells you immediately where your time is spent in your source code ‒ Helps you to optimize your application efficiently
  • 6. Allinea MAP 4.2 New features in 2013 • Support for I/O metrics ‒ I/O can be a major bottleneck in HPC systems ‒ Find the optimal configuration for your file system. Benefit : Broader profiling and analysis capabilities to solve even more performance issues. • Support for Intel® Xeon Phi™ ‒ Already supported on Allinea DDT ‒ Officially extended to profiling Benefit : Ensure you are getting the best performance from new technology.
  • 7. Optimizing for Intel® Xeon Phi™ Where do you start? “Code that’s well-optimized for the host usually performs pretty well on the cards” - Almost everybody
  • 8. Optimizing for Intel® Xeon Phi™ But what matters? Vectorization Performance Other stuff
  • 9. Optimizing for Intel® Xeon Phi™ Is my code well-vectorized? … maybe?
  • 10. Allinea Performance Reports Is my code well-vectorized?
  • 11. Optimizing for Intel® Xeon Phi™ Is my code well-vectorized? … maybe?
  • 12. Optimizing for Intel® Xeon Phi™ Is my code well-vectorized? Not in this loop (16.5% of total time) … maybe?
  • 13. Allinea DDT Unified interface for debugging • Full, graphical debugger designed for : ‒ C/C++, Fortran, Intel® Xeon Phi™, UPC, … ‒ MPI, OpenMP and mixed-mode code • Unified interface with Allinea MAP : ‒ Just what you need when you’ve added OpenMP and now everything segfaults! ‒ One interface eliminates learning curve ‒ Spend more time on your results • Slash your time to develop : ‒ Reproduces and triggers your bugs instantly ‒ Helps you easily understand where issues come from quickly ‒ Helps you to fix them as swiftly as possible
  • 14. Allinea at the forefront of science with COSMOS and Intel® Xeon Phi™ “While I was porting CAMB to offload certain parts of it to Intel® Xeon Phi™, I wasted weeks debugging it because the offloads were basically opaque. I only had print statements to help me.”
  • 15. Allinea at the forefront of science with COSMOS and Intel® Xeon Phi™ “Using DDT's new offload debugging I can now look at the offload code and look at the state of the array on the Intel® Xeon Phi™ side before it is manipulated”
  • 16. Allinea at the forefront of science with COSMOS and Intel® Xeon Phi™ Fix is easy - either set NOCOPY->IN or just set the thing to zero on the MIC side which is probably cheaper.”
  • 17. Allinea at the forefront of science with COSMOS and Intel® Xeon Phi™ “I’m now using MAP – it shows that the code is fairly well vectorised at 70%. This will have to be improved a bit to get the most out of the coprocessors.”
  • 18. Allinea Software • Ten years of high-quality development tools ‒ Leading in HPC software tools market worldwide ‒ Global customer base • Making parallel programming accessible to the widest range of scientists and programmers ‒ Design an unrivaled productive and easy-to-use development environment… ‒ … To help you reach the highest level of performance and scalability ‒ Define a new standard of customer support
  • 19. Summary The premier Intel® Xeon Phi™ development environment from Allinea – Is your code ready for Intel® Xeon Phi™? Run a Performance Report! – See which loops are important to vectorize with Allinea MAP – Stay productive with full profiling and debugging on both host and coprocessor – Powerful unified interface with industry-leading technical support to help you get the job finished faster Visit us at our booth #1719 to see this in action! Enter our Performance Reports competition to win a Kindle Fire every day!