Early Successes Debugging with TotalView on the Intel Xeon Phi Coprocessor

  • 294 views
Uploaded on

Rogue Wave see performance gains and correct results with Xeon Phi

Rogue Wave see performance gains and correct results with Xeon Phi

More in: Technology , Education
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
294
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
0
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Early Successes Debugging with TotalView on the Intel XeonPhi CoprocessorChris GottbrathPrincipal Product ManagerJune 19th, 2013
  • 2. 1Rogue Wave TodayHistory• Founded: 1989• Acquired by Audax Group: 2012• Acquired:– TotalView Technologies: 2009• 40 years of experience in HPCCustomers• 3,000+ customers in 36 countries• Multiple sectors:– Financial services– Telecom– Oil and gas– Government and aerospace– Research and academicThe largest independent provider of cross-platformsoftware development tools and embedded componentsfor the next generation of HPC applications.Highlights• Pioneers in C++/object-oriented development• Leading the way incross-platform, paralleldevelopment| Copyright © 2013 Rogue Wave Software | All Rights Reserved
  • 3. What is TotalView®?• Application Analysis and Debugging Tool: Code Confidently– Debug and Analyse C/C++ and Fortran on Linux, Unix or Mac OS X– Laptops to supercomputers (Such as Cray® XC)– Makes developing, maintaining, and supporting critical appseasier and less risky• Major Features– Easy to learn graphical user interface with data visualization– Parallel Debugging• MPI, Pthreads, OpenMP™• Intel® Xeon Phi™ coprocessor– Includes a Remote Display Client which frees you to work fromanywhere– Memory Debugging with MemoryScape™– Deterministic Replay Capability Included on Linux™/x86-64– Non-interactive Batch Debugging with TVScript and the CLI– TTF & C++View to transform user defined objects| Copyright © 2013 Rogue Wave Software | All Rights Reserved2
  • 4. TotalView for the Intel Xeon Phi coprocessor• Supports Multiple Intel Xeon Phi coprocessor configurations– Native Mode• With MPI– Offload Directives• Similar to GPU– Multi-device– Multi-node• Certain configurations– CS300-AC, Future XC30• User Interface– MPI Debugging Features• Process Control• View Across• Shared Breakpoints– Heterogeneous Debugging• Debug Both Xeon and Intel Xeon Phi Processes| Copyright © 2013 Rogue Wave Software | All Rights Reserved3
  • 5. The Beacon ProjectBeacon – Phase 1Cray CS300-AC Cluster SupercomputerNodes 2 service,16 computeInterconnect FDR IB Fat TreeCPU model Intel Xeon E5-2670CPUs per node 2 8-core, 2.6 GHzRAM per node 64 GBSSD per node 80 GBIntel® Xeon Phi™ coprocessors per node 2 x pre-production50+ cores,8 GB GDDR5 RAMBeacon – Phase 2Cray CS300-AC Cluster SupercomputerNodes 4 service, 6 I/O,48 computeInterconnect FDR IB Fat TreeCPU model Intel Xeon E5-2670CPUs per node 2 8-core, 2.6GHzRAM per node 256 GBSSD per node 2 x 480 GB (compute),16 x 300 GB (I/O)Intel® Xeon Phi Coprocessors per node 4 x 5110P60-core, 1.053GHz8 GB GDDR5 RAM• Funded by NSF to port and optimize scientific codes to the Intel® Xeon Phi™ coprocessor• State-funded expansion focuses on energy efficiency, big data applications, and industry• Example Codes: PSC, H3D, OMEN, ENZO, MADNESS, NWCHEM, Amber, MILC, and MAGMAThis material is based upon work supported by the National Science Foundation under Grant Number 1137097.Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarilyreflect the views of the National Science Foundation.
  • 6. Beacon Project• 210 Tflops, 11,520 Xeon Phi Coprocessor Cores• #1 on the Green 500• 9 scientific teams optimizing apps including: MHD, Plasma, Cosmology, Chemistry,QCD, Bio-informatics…• Porting & optimization– Hybrid MPI + OpenMP– Many more threads than previous paradigms• Subtle issues might present themselves– And did| Copyright © 2013 Rogue Wave Software | All Rights Reserved5
  • 7. Debugging on Beacon with TotalView• OpenMP Hybridization of Boltzman BGK– Correctness issues came up with the OpenMP code– Troubleshooting with TotalView• Native mode debugging on the Xeon Phi• Thread level examination of the OpenMP region• Comparison of data between threads• … Clarified otherwise puzzling results– Developers were able to resolve the correctness issue– Ultimately obtained performance gains on the Xeon Phi• And correct results| Copyright © 2013 Rogue Wave Software | All Rights Reserved6
  • 8. Debugging on Beacon with TotalView• Porting Gyro Tokamak Plasma Simulation to the Xeon Phi– Intermittent crash due to Out Of Memory (OOM) Condition– Troubleshooting with TotalView• TotalView was used to diagnose the issue across MPI processes• Work was not being distributed evenly• … the routine had an invalid assumption– Developers were able to resolve the OOM error– Better load balancing also improved the performance of the code| Copyright © 2013 Rogue Wave Software | All Rights Reserved7
  • 9. TotalView for Xeon PhiNICS had a goal to port and optimize scientific codes for the many core Xeon Phi co-processor.TotalView helped the Beacon project developers troubleshoot issues that came up during theprocess.There are many other scientific codes that benefit from the power of Intel Xeon Phi by adopting ahybrid MPI + OpenMP architecture and tuning for the right number of threads per process.TotalView is now generally available with support for the Intel Xeon Phi and can help other scientiststake advantage of the power of Intel many core technology.Please visit us here at ISC Booth 550 to learn more!| Copyright © 2013 Rogue Wave Software | All Rights Reserved8
  • 10. Developing parallel, data-intensive applications is hard.We make it easier.www.roguewave.com