STAT: A Debugging Tool
                 For Extreme Scale


                        Martin Schulz
           Center for Applied Scientific Computing
           Lawrence Livermore National Laboratory
ASC STAT Team: Greg Lee, Dong Ahn (LLNL), Dane Gardner (LANL)
        Developed at LLNL, University of Wisconsin &
                  University of New Mexico
         Lawrence Livermore National Laboratory, P. O. Box 808, Livermore, CA 94551

            This work performed under the auspices of the U.S. Department of Energy by
            Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344

                                                                                         LLNL-PRES-426152
STAT: Debugging Support at Scale
  The debugging challenge at scale
    • Traditional debuggers break down at scale
    • Data and control for too many tasks
    • Sequential paradigm
  How can STAT help?
    • Identify equivalence classes
    • Pre-analysis for subset debugging
  Typical use case
    • Application hang (life or dead-lock)
    • Answer the question: What is my code doing now?


Lawrence Livermore National Laboratory
Stacktraces: The Basis for STAT




Lawrence Livermore National Laboratory
Gathering Stack Traces
  STAT gathers stack traces from
    • Multiple processes
    • Multiple samples per process




            3D 2D Trace/Space Call Graph Prefix Tree
               Trace/Space/Time Call Graph Prefix Tree

   MPI                           MPI            MPI


Lawrence Livermore National Laboratory
Interpreting Stacktrace Trees




Task 0              Task 1               Task 2



      Your Favorite Debugger




Lawrence Livermore National Laboratory
STAT GUI




Lawrence Livermore National Laboratory
Availability
Platform           Ver.      Usage             Documentation                           POC
LLNL/TLCC          0.9.4     STATGUI           https://computing.llnl.gov/code/STAT/   Greg Lee
OCF                          STAT                                                      lee218@llnl.gov
LLNL/TLCC          0.9.4     STATGUI           https://computing.llnl.gov/code/STAT/   Greg Lee
SCF                          STAT                                                      lee218@llnl.gov
LLNL/uBGL          0.9.0     STAT              https://computing.llnl.gov/code/STAT/   Greg Lee
                   beta                                                                lee218@llnl.gov
LLNL/Dawn          0.9.4     STATGUI           https://computing.llnl.gov/code/STAT/   Greg Lee
                   beta      STAT                                                      lee218@llnl.gov
SNL/Glory          0.9.2     see below         https://computing.llnl.gov/code/STAT/   Mahesh Rajan
                                                                                       mrajan@sandia.gov
LANL/Yellow        0.9.1b     Mod: hpc-tools   man stat                                consult@lanl.gov
Turing                        Mod: stat
LANL/Turquoise     0.9.2      Mod: hpc-tools   man stat                                consult@lanl.gov
Lobo                          Mod: stat



Usage for SNL/Glory:                                             Note: Red Storm has a poor-man STAT-like
module switch mpi mpi/mvapich-1.1_intel-11.1-f064-c064           utility called fast_where.
module load /home/jgalaro/privatemodules/openss-mvapich          Try "man fast_where” for usage instructions.

  Lawrence Livermore National Laboratory
Usage Instructions
  Option 1: Graphical User Interface
    • Launch GUI: STATGUI
    • Attach, create stacktraces & views through GUI
  Option 2: Command line
    • STAT <MPI launcher pid>
       − -t: number of traces
       − -T: time between traces
    • Reports output file to stdout
    • STATview <output file>
  Additional information
    • man STAT / STAT –h
    • acroread /usr/local/tools/stat/doc/*.pdf
Lawrence Livermore National Laboratory
Advanced Topics
  Scalable Implementation                               FE

    • Tree-based overlay networks
       − Data aggregation on the fly               CP         CP

       − Tree depth configurable
                                              CP                   CP
    • Parameters to STAT
    • Useful for 10,000+ tasks           BE         BE
                                                         …    BE        BE

  Temporal Analysis
    • Finer grain analysis of process location
    • Disambiguation of iteration instances
    • Employs static analysis to determine loop variables

Lawrence Livermore National Laboratory
Reference & Demo Session
  Usage documentation
    • https://computing.llnl.gov/code/STAT/
  Man page
    • man STAT or man STATGUI
    • STAT -h
  Background information
    • http://www.paradyn.org/STAT/STAT.html

  Demo Session / Track 3




Lawrence Livermore National Laboratory

Lee.stat

  • 1.
    STAT: A DebuggingTool For Extreme Scale Martin Schulz Center for Applied Scientific Computing Lawrence Livermore National Laboratory ASC STAT Team: Greg Lee, Dong Ahn (LLNL), Dane Gardner (LANL) Developed at LLNL, University of Wisconsin & University of New Mexico Lawrence Livermore National Laboratory, P. O. Box 808, Livermore, CA 94551 This work performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344 LLNL-PRES-426152
  • 2.
    STAT: Debugging Supportat Scale  The debugging challenge at scale • Traditional debuggers break down at scale • Data and control for too many tasks • Sequential paradigm  How can STAT help? • Identify equivalence classes • Pre-analysis for subset debugging  Typical use case • Application hang (life or dead-lock) • Answer the question: What is my code doing now? Lawrence Livermore National Laboratory
  • 3.
    Stacktraces: The Basisfor STAT Lawrence Livermore National Laboratory
  • 4.
    Gathering Stack Traces  STAT gathers stack traces from • Multiple processes • Multiple samples per process 3D 2D Trace/Space Call Graph Prefix Tree Trace/Space/Time Call Graph Prefix Tree MPI MPI MPI Lawrence Livermore National Laboratory
  • 5.
    Interpreting Stacktrace Trees Task0 Task 1 Task 2 Your Favorite Debugger Lawrence Livermore National Laboratory
  • 6.
    STAT GUI Lawrence LivermoreNational Laboratory
  • 7.
    Availability Platform Ver. Usage Documentation POC LLNL/TLCC 0.9.4 STATGUI https://computing.llnl.gov/code/STAT/ Greg Lee OCF STAT lee218@llnl.gov LLNL/TLCC 0.9.4 STATGUI https://computing.llnl.gov/code/STAT/ Greg Lee SCF STAT lee218@llnl.gov LLNL/uBGL 0.9.0 STAT https://computing.llnl.gov/code/STAT/ Greg Lee beta lee218@llnl.gov LLNL/Dawn 0.9.4 STATGUI https://computing.llnl.gov/code/STAT/ Greg Lee beta STAT lee218@llnl.gov SNL/Glory 0.9.2 see below https://computing.llnl.gov/code/STAT/ Mahesh Rajan mrajan@sandia.gov LANL/Yellow 0.9.1b Mod: hpc-tools man stat consult@lanl.gov Turing Mod: stat LANL/Turquoise 0.9.2 Mod: hpc-tools man stat consult@lanl.gov Lobo Mod: stat Usage for SNL/Glory: Note: Red Storm has a poor-man STAT-like module switch mpi mpi/mvapich-1.1_intel-11.1-f064-c064 utility called fast_where. module load /home/jgalaro/privatemodules/openss-mvapich Try "man fast_where” for usage instructions. Lawrence Livermore National Laboratory
  • 8.
    Usage Instructions Option 1: Graphical User Interface • Launch GUI: STATGUI • Attach, create stacktraces & views through GUI  Option 2: Command line • STAT <MPI launcher pid> − -t: number of traces − -T: time between traces • Reports output file to stdout • STATview <output file>  Additional information • man STAT / STAT –h • acroread /usr/local/tools/stat/doc/*.pdf Lawrence Livermore National Laboratory
  • 9.
    Advanced Topics Scalable Implementation FE • Tree-based overlay networks − Data aggregation on the fly CP CP − Tree depth configurable CP CP • Parameters to STAT • Useful for 10,000+ tasks BE BE … BE BE  Temporal Analysis • Finer grain analysis of process location • Disambiguation of iteration instances • Employs static analysis to determine loop variables Lawrence Livermore National Laboratory
  • 10.
    Reference & DemoSession  Usage documentation • https://computing.llnl.gov/code/STAT/  Man page • man STAT or man STATGUI • STAT -h  Background information • http://www.paradyn.org/STAT/STAT.html  Demo Session / Track 3 Lawrence Livermore National Laboratory