Be the first to like this
Although the evolution of hardware is improving at an incredible rate, the advances in
parallel software have been hampered for many reasons. Developing an efficient parallel
application is still not an easy task. Our thesis is that many performance problems and their reasons can be quickly located and explained with automated techniques that work on unmodified parallel applications. This work identifies main obstacles for such diagnosis and presents a two-step approach for addressing them. In this approach, the application is automatically modeled and diagnosed during its execution.
First, we introduce an online performance modeling technique that enables automated discovery of causal execution flows through communication and computational activities in message-passing parallel programs. Second, we present a systematic approach to online performance analysis. The automated
analysis uses online model to quickly identify the most important performance problems,
and correlate them with application source code. Our technique is able to discover causal
dependences between the problems, infer their root causes in some scenarios and explain
them to developers. In this work, we focus on diagnosing scientific MPI parallel applications and their communication and computational problems although the approach can be extended to support other classes of activities and programming models.
We have evaluated our approach on a variety of scientific parallel applications. In all scenarios, our online performance modeling technique proved effective for low-overhead capturing of program’s behavior and facilitated performance understanding. With our automated, model-based performance analysis approach, we were able to easily identify the most severe performance problems during application execution, and locate their root causes without previous knowledge of application internals.