Your SlideShare is downloading. ×
0
Get Performance on Intel® Xeon Phi™ with
Allinea MAP and Allinea DDT
Discovering bottlenecks without pain
In my Parallel Universe…

… we develop new antibiotics faster than
bacteria develop resistance
... every household can pro...
So I decided to…
… create parallel development tools for scientists:

We’re accelerating the pace of scientific progress
HPC on the critical path to progress
Single Core Era

Multi-Core Era

Many-Core Era

Constraints :

Constraints :

Constra...
Allinea MAP
Increase application performance
• Parallel profiler designed for:
‒ C/C++, Fortran
‒ MPI code
 Interdependen...
Allinea MAP 4.2
New features in 2013
• Support for I/O metrics
‒ I/O can be a major bottleneck in HPC systems
‒ Find the o...
Optimizing for Intel® Xeon Phi™
Where do you start?

“Code that’s well-optimized for the host
usually performs pretty well...
Optimizing for Intel® Xeon Phi™
But what matters?

Vectorization
Performance

Other
stuff
Optimizing for Intel® Xeon Phi™
Is my code well-vectorized?

… maybe?
Allinea Performance Reports
Is my code well-vectorized?
Optimizing for Intel® Xeon Phi™
Is my code well-vectorized?

… maybe?
Optimizing for Intel® Xeon Phi™
Is my code well-vectorized?

Not in this loop
(16.5% of total time)

… maybe?
Allinea DDT
Unified interface for debugging
• Full, graphical debugger designed for :
‒ C/C++, Fortran, Intel® Xeon Phi™, ...
Allinea at the forefront of science
with COSMOS and Intel® Xeon Phi™

“While I was porting CAMB to offload certain parts o...
Allinea at the forefront of science
with COSMOS and Intel® Xeon Phi™

“Using DDT's new offload debugging I can now look at...
Allinea at the forefront of science
with COSMOS and Intel® Xeon Phi™

Fix is easy - either set NOCOPY->IN or just set the ...
Allinea at the forefront of science
with COSMOS and Intel® Xeon Phi™

“I’m now using MAP – it shows that the code is fairl...
Allinea Software
• Ten years of high-quality development tools
‒ Leading in HPC software tools market worldwide
‒ Global c...
Summary
The premier Intel® Xeon Phi™ development environment from Allinea
– Is your code ready for Intel® Xeon Phi™? Run a...
Upcoming SlideShare
Loading in...5
×

HPC Performance & Development Tuning tools for scientists to go parallel faster with allinea

194

Published on

Published in: Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
194
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
4
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • It’s not often that marketing lives up to its hype, but something we’ve consistently heard from users around the world porting their codes to Xeon Phi is that – once they’ve done a good job of optimizing for the host – the performance on the Phi is normally pretty good right away.
  • The reason is that even on a standard Xeon these days, you need to take advantage of vectorized instructions to get good performance. With 512-bit registers, vectorization is absolutely critical to achieving good performance on the Xeon Phi. There’s no point in sending all the cars down one lane of the highway!
  • The Intel compilers can give very detailed reports about what they’re doing to each loop using the –vec-report flags, but even on a small program you need to know which loops are worth spending your time on and which you can ignore.
  • Allinea MAP shows you the behavior of your code at a single glance – let me briefly walk you through the interface here. <talk about how to interpret the metric graphs and the sparkline graphs next to the code viewer. Finish by pointing out that the CPU floating-point vector graph is at 0 for the selected region of time!
  • This is Allinea MAP’s answer to our question – there’s an important loop taking 16.5% of the total program time that isn’t vectorizing at all! Now we know which lines of code are affected, we can ask the compiler for a report and investigate further.
  • It’s not just profiling that works the same – our unified interface is shared with Allinea DDT, a full-featured debugger supporting a huge range of platforms and codes including the Xeon Phi.
  • You can’t achieve full performance by looking through a microscope all the time – you have to be able to step back from the quest to vectorized the next loop, and the next, and ask “is this worth it? Is there a library I can use here? Can I refactor my code here?” MAP gives you the oversight and insight you need to answer these questions.
  • You can’t achieve full performance by looking through a microscope all the time – you have to be able to step back from the quest to vectorized the next loop, and the next, and ask “is this worth it? Is there a library I can use here? Can I refactor my code here?” MAP gives you the oversight and insight you need to answer these questions.
  • You can’t achieve full performance by looking through a microscope all the time – you have to be able to step back from the quest to vectorized the next loop, and the next, and ask “is this worth it? Is there a library I can use here? Can I refactor my code here?” MAP gives you the oversight and insight you need to answer these questions.
  • And when you come to run your code on the card, Allinea MAP gathers exactly the same information and displays it in exactly the same way
  • Transcript of "HPC Performance & Development Tuning tools for scientists to go parallel faster with allinea"

    1. 1. Get Performance on Intel® Xeon Phi™ with Allinea MAP and Allinea DDT Discovering bottlenecks without pain
    2. 2. In my Parallel Universe… … we develop new antibiotics faster than bacteria develop resistance ... every household can prototype and evolve their own 3D-printed designs … accurate simulation of the natural world is taken for granted
    3. 3. So I decided to… … create parallel development tools for scientists: We’re accelerating the pace of scientific progress
    4. 4. HPC on the critical path to progress Single Core Era Multi-Core Era Many-Core Era Constraints : Constraints : Constraints : -Power -Power -Parallel software availability -Scalability -Programming models Performance -Complexity of algorithms Time(years)
    5. 5. Allinea MAP Increase application performance • Parallel profiler designed for: ‒ C/C++, Fortran ‒ MPI code  Interdependent or independent processes ‒ Multithreaded code  Monitor the main threads for each process ‒ Accelerated codes  GPUs, Intel® Xeon Phi™ • Improve productivity : ‒ Helps you detect performance issues quickly and easily ‒ Tells you immediately where your time is spent in your source code ‒ Helps you to optimize your application efficiently
    6. 6. Allinea MAP 4.2 New features in 2013 • Support for I/O metrics ‒ I/O can be a major bottleneck in HPC systems ‒ Find the optimal configuration for your file system. Benefit : Broader profiling and analysis capabilities to solve even more performance issues. • Support for Intel® Xeon Phi™ ‒ Already supported on Allinea DDT ‒ Officially extended to profiling Benefit : Ensure you are getting the best performance from new technology.
    7. 7. Optimizing for Intel® Xeon Phi™ Where do you start? “Code that’s well-optimized for the host usually performs pretty well on the cards” - Almost everybody
    8. 8. Optimizing for Intel® Xeon Phi™ But what matters? Vectorization Performance Other stuff
    9. 9. Optimizing for Intel® Xeon Phi™ Is my code well-vectorized? … maybe?
    10. 10. Allinea Performance Reports Is my code well-vectorized?
    11. 11. Optimizing for Intel® Xeon Phi™ Is my code well-vectorized? … maybe?
    12. 12. Optimizing for Intel® Xeon Phi™ Is my code well-vectorized? Not in this loop (16.5% of total time) … maybe?
    13. 13. Allinea DDT Unified interface for debugging • Full, graphical debugger designed for : ‒ C/C++, Fortran, Intel® Xeon Phi™, UPC, … ‒ MPI, OpenMP and mixed-mode code • Unified interface with Allinea MAP : ‒ Just what you need when you’ve added OpenMP and now everything segfaults! ‒ One interface eliminates learning curve ‒ Spend more time on your results • Slash your time to develop : ‒ Reproduces and triggers your bugs instantly ‒ Helps you easily understand where issues come from quickly ‒ Helps you to fix them as swiftly as possible
    14. 14. Allinea at the forefront of science with COSMOS and Intel® Xeon Phi™ “While I was porting CAMB to offload certain parts of it to Intel® Xeon Phi™, I wasted weeks debugging it because the offloads were basically opaque. I only had print statements to help me.”
    15. 15. Allinea at the forefront of science with COSMOS and Intel® Xeon Phi™ “Using DDT's new offload debugging I can now look at the offload code and look at the state of the array on the Intel® Xeon Phi™ side before it is manipulated”
    16. 16. Allinea at the forefront of science with COSMOS and Intel® Xeon Phi™ Fix is easy - either set NOCOPY->IN or just set the thing to zero on the MIC side which is probably cheaper.”
    17. 17. Allinea at the forefront of science with COSMOS and Intel® Xeon Phi™ “I’m now using MAP – it shows that the code is fairly well vectorised at 70%. This will have to be improved a bit to get the most out of the coprocessors.”
    18. 18. Allinea Software • Ten years of high-quality development tools ‒ Leading in HPC software tools market worldwide ‒ Global customer base • Making parallel programming accessible to the widest range of scientists and programmers ‒ Design an unrivaled productive and easy-to-use development environment… ‒ … To help you reach the highest level of performance and scalability ‒ Define a new standard of customer support
    19. 19. Summary The premier Intel® Xeon Phi™ development environment from Allinea – Is your code ready for Intel® Xeon Phi™? Run a Performance Report! – See which loops are important to vectorize with Allinea MAP – Stay productive with full profiling and debugging on both host and coprocessor – Powerful unified interface with industry-leading technical support to help you get the job finished faster Visit us at our booth #1719 to see this in action! Enter our Performance Reports competition to win a Kindle Fire every day!
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×