TotalView Debugger On Blue Gene

770 views

Published on

Totalview Debugger on Blue Gene

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
770
On SlideShare
0
From Embeds
0
Number of Embeds
6
Actions
Shares
0
Downloads
36
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

TotalView Debugger On Blue Gene

  1. 1. Scalable Debugging with TotalView on Blue Gene  John DelSignore, CTO TotalView Technologies
  2. 2. Agenda • TotalView on Blue Gene – A little history – Current status • Recent TotalView improvements – ReplayEngine (reverse debugging) – Remote Display – TotalView Script (batch debugging) • Future work – BG/* – Heterogeneous systems – Many core, transactional memory, speculative execution – Peta­scale debugging TotalView Technologies – Confidential and Proprietary – Plans Subject to Change without Notice www.totalviewtech.com 2
  3. 3. Supported Blue Gene  Architectures and Compilers • Blue Gene/L and Blue Gene/P • Languages / Compilers – C/C++, Fortran, Assembly – GNU Compilers – IBM Compilers – IBM OpenMP (on BG/P) • Parallel Environments – IBM MPI  – IBM OpenMP (on BG/P) – Pthreads (BG/P) • Runtime linking/loading (BG/P) – Shared libraries – Dynamically loaded shared libraries TotalView Technologies – Confidential and Proprietary – Plans Subject to Change without Notice www.totalviewtech.com 3
  4. 4. Blue Gene Architecture • TotalView client (GUI/CLI)  runs on the Front End node • Client communicates with  the TotalView debugger  servers running on the I/O  nodes via a socket • The debugger servers  communicate with the  CIOD to control processes  and threads running on the  Compute nodes • Fan­out ratios (CNs/server) – BG/L: 32­64, 2 cores/CN,  128 threads/server – BG/P:128­256, 4 cores/CN,  1024 threads/server – Ratio increasing (8K thr/svr?) – Parallelize server operation TotalView Technologies – Confidential and Proprietary – Plans Subject to Change without Notice www.totalviewtech.com 4
  5. 5. TotalView Blue Gene/L Support • TotalView involvement since 2003 • Support for Blue Gene/L since 2005 • Debugging interfaces developed via close  collaboration with IBM • Used on DOE/NNSA/LLNL's  Blue Gene/L system  containing 212 K cores – Heap memory debugging support added – Blue Gene/L scaling and performance tuning project – TotalView has debugged jobs as large as 8,192 processes  (LLNL) • Work on Blue Gene/L facilitated Blue Gene/P  support TotalView Technologies – Confidential and Proprietary – Plans Subject to Change without Notice www.totalviewtech.com 5
  6. 6. TotalView Blue Gene/P Support • Blue Gene/P supported since Q4 2007 • Continued close collaboration with IBM to  develop multi­threaded debugging interfaces • Support for shared libraries and dynamically  loaded libraries • Scalability improvements • TotalView has debugged jobs as large as 32K  (Jülich) TotalView Technologies – Confidential and Proprietary – Plans Subject to Change without Notice www.totalviewtech.com 6
  7. 7. TotalView Blue Gene/P Sites • Currently running at over  30 sites in Germany,  France, UK, and US, including – Argonne – Boston University – Daresbury – IDRIS – Jülich – LLNL – Max Planck – ORNL – Princeton University – Rensselaer Polytechnic Institute • Jülich workshop, March 08 • Argonne workshop, May 08 TotalView Technologies – Confidential and Proprietary – Plans Subject to Change without Notice www.totalviewtech.com 7
  8. 8. Recent TotalView Improvements on Blue Gene and Linux • Remote Display – Run a remote version of the TotalView GUI… – …display it locally, with fast, interactive performance – Easy, fast, secure • tvscript – Simplifies debugging batch jobs – Event/action paradigm – Configurable • ReplayEngine – Step execution back in time – Uses reverse debugging technology – Linux x86 and x86­64 (currently only) TotalView Technologies – Confidential and Proprietary – Plans Subject to Change without Notice www.totalviewtech.com 8
  9. 9. Remote Display • Presents a window on your  machine that will display  TotalView executing on a  remote system • Two components:  – Client, runs on the local  system, available for   Linux x86, x86­64  Windows XP, Vista – Server, which runs on any  system supported by  TotalView, invisibly  managing the connections  between the host and client • The Client also provides for  submission of jobs to  batch queuing systems  PBS Pro and LoadLeveler TotalView Technologies – Confidential and Proprietary – Plans Subject to Change without Notice www.totalviewtech.com 9
  10. 10. Batch Scripting • Designed for debugging in a batch environment • tvscript lets you define the events to act on, the actions to  take when an event occurs • Typical events – Action point (e.g., breakpoint) – Memory error (e.g., malloc returns 0, guard block corruption) – Errors (e.g., SEGV, FPE) • Typical actions – Display a backtrace – List memory leaks – Print variables and arrays • Configurable – Supports external script files – Allows generation of even more complex actions and events TotalView Technologies – Confidential and Proprietary – Plans Subject to Change without Notice www.totalviewtech.com 10
  11. 11. Replay Engine • Intuitive user interface, integrated with TotalView Step forward over functions Step backward over functions Step forward into functions Step backward into functions Advance forward out of current  Advance backward out of  current Function, after the call Function, to before the call Advance forward to selected line  Advance backward to selected line Advance forward to “live” session TotalView Technologies – Confidential and Proprietary – Plans Subject to Change without Notice www.totalviewtech.com 11
  12. 12. Possible Future Blue Gene Work • BG/* support – Support future generations of Blue Gene • Fast conditional breakpoints/watchpoints – Expressions compiled/patched into target, excute in parallel,  about 10usecs/expression • Asynchronous thread control – Thread barrier breakpoint, thread single stepping • User programmable visual data – Allows user define complex data access function • Debugging optimized code • Post­mortem debugging • Fast DLL debugging interface • LLNL collaboration for scalable subset attach – Integrates with lightweight tools such as STAT TotalView Technologies – Confidential and Proprietary – Plans Subject to Change without Notice www.totalviewtech.com 12
  13. 13. Possible Other Future Work • Scalability/performance – Continue scalability and performance improvements – Tree­based infrastructure for logarithmic scaling – Peta­scale debugging – Hundreds of thousands of threads • Heterogeneous systems – IBM Roadrunner (x86­64/Cell) – GPUs • Emerging technologies – Many core – Transactional memory – Speculative execution TotalView Technologies – Confidential and Proprietary – Plans Subject to Change without Notice www.totalviewtech.com <number>
  14. 14. Questions?   More Information • Blue Gene Technical Development Interest Group Contact chris.gottbrath@totalviewtech.com – • Technical support  support@totalviewtech.com – • BG LLNL case study www.totalviewtech.com/pdf/case_study_scientific_computing.pdf  – • Customer training or webinars contacttraininggroup@totalviewtech.com – • Web site  – www.totalviewtech.com TotalView Technologies – Confidential and Proprietary – Plans Subject to Change without Notice www.totalviewtech.com <number>

×