The document discusses performance tuning methodology for distributed clusters using Intel Trace Analyzer and Collector (ITAC) and Intel VTune Amplifier XE. It provides an overview of the tools' key features and what's new in recent versions. A 3-step methodology is outlined: 1) cluster-level analysis and algorithm tuning, 2) run-time analysis and tuning, and 3) intra-node and single-node analysis. The methodology is demonstrated on a Poisson example using ITAC and VTune Amplifier XE to optimize MPI communications and identify performance issues.