Performance Optimization
                                             by Compilers




                                                      Luo Bu
                                                   2010-12-18


             中科信软高级技术培训中心-www.info-soft.cn




              Compiler Optimization Introduction



 IPO (Inter-Procedural Optimization)
 PGO (Profile-Guided Optimization)
 HLO (High-Level Optimization)




                 中科信软高级技术培训中心-www.info-soft.cn




                                                                1
Intel guided optimization path

            • Use the General Optimization Options
       1

            • Use processor-specific options
       2

            • Identify performance hotspots
       3

            • Use IPO and PGO
       4

            • Use parallel performance options for Multi-core, Multi-
       5      processor, or HT enabled system

            • Use thread profiler to maximize performance
       6


                          中科信软高级技术培训中心-www.info-soft.cn




                     IPO (Inter-Procedural Optimization)


Interprocedural Optimization (IPO) allows the compiler to analyze your
code to determine where you can benefit from specific optimizations.
When you use IPO with the -x or -ax (Linux* OS) options, or the /Qx or
/Qax (Windows* OS) options, you may see additional optimizations for
Intel microprocessors than for non-Intel microprocessors.
   inlining                               unreferenced variable removal
   constant propagation                   whole program analysis
   mod/ref analysis                       array dimension padding
   alias analysis                         common block splitting
   forward substitution                   stack frame alignment
   routine key-attribute propagation      structure splitting and field reordering
   address-taken analysis                 formal parameter alignment analysis
   partial dead call elimination          C++ class hierarchy analysis
   symbol table data promotion            indirect call conversion
   common block variable coalescing       specialization
   dead function elimination

                          中科信软高级技术培训中心-www.info-soft.cn




                                                                                       2
PGO (Profiled-Guided Optimization)

Profile-guided
Optimization (PGO)
improves application
performance by
reorganizing code layout
to reduce instruction-
cache problems,
shrinking code size, and
reducing branch
mispredictions. PGO
provides information to
the compiler about areas
of an application that are
most frequently executed.
By knowing these areas,
the compiler is able to be
more selective and
specific in optimizing the
application.

                        中科信软高级技术培训中心-www.info-soft.cn




                       HLO (High-Level Optimization)


High-level Optimizations (HLO) exploits the properties of source code
constructs in applications developed in high-level programming
languages. While the default optimization level, -O2 option, performs
some high-level optimizations, specifying -O3 provides the best chance
for performing loop transformations to optimize memory accesses.

   Loop Permutation or Interchange    Predicate Optimization
   Loop Distribution                  Loop Reversal
   Loop Fusion                        Profile-Guided Loop Unrolling
   Loop Unrolling                     Loop Peeling
   Data Prefetching                   Data Transformation: Malloc
   Scalar Replacement                  Combining and Memset Combining
   Unroll and Jam                     Loop Rerolling
   Loop Blocking or Tiling            Memset and Memcpy Recognition
   Partial-Sum Optimization           Statement Sinking for Creating
                                        Perfect Loopnests


                        中科信软高级技术培训中心-www.info-soft.cn




                                                                         3
Truth from Graph

 Benchmark Tuning Curve with gcc/icc, on various optimization
 flags




                      中科信软高级技术培训中心-www.info-soft.cn




                     Compiler optimization

 GCC Optimization
    •   http://www.linuxjournal.com/article/7269
    •   http://www.network-theory.co.uk/docs/gccintro/gccintro_49.html
    •   http://gcc.gnu.org/wiki/LinkTimeOptimization
    •   http://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html


 Intel C++ Optimization(aka icc)
    • http://software.intel.com/sites/products/collateral/hpc/compilers/compiler_qrg
      12.pdf
    • http://software.intel.com/file/6282
    • http://software.intel.com/sites/products/documentation/hpc/composerxe/en-
      us/cpp/lin/index.htm
    • http://software.intel.com/en-us/articles/performance-tools-for-software-
      developers-intel-compiler-options-for-sse-generation-and-processor-specific-
      optimizations/


                      中科信软高级技术培训中心-www.info-soft.cn




                                                                                       4
Q
&A
QUESTIONS
 ANSWERS



 中科信软高级技术培训中心-www.info-soft.cn




                                 5

Compiler optimization

  • 1.
    Performance Optimization by Compilers Luo Bu 2010-12-18 中科信软高级技术培训中心-www.info-soft.cn Compiler Optimization Introduction  IPO (Inter-Procedural Optimization)  PGO (Profile-Guided Optimization)  HLO (High-Level Optimization) 中科信软高级技术培训中心-www.info-soft.cn 1
  • 2.
    Intel guided optimizationpath • Use the General Optimization Options 1 • Use processor-specific options 2 • Identify performance hotspots 3 • Use IPO and PGO 4 • Use parallel performance options for Multi-core, Multi- 5 processor, or HT enabled system • Use thread profiler to maximize performance 6 中科信软高级技术培训中心-www.info-soft.cn IPO (Inter-Procedural Optimization) Interprocedural Optimization (IPO) allows the compiler to analyze your code to determine where you can benefit from specific optimizations. When you use IPO with the -x or -ax (Linux* OS) options, or the /Qx or /Qax (Windows* OS) options, you may see additional optimizations for Intel microprocessors than for non-Intel microprocessors.  inlining  unreferenced variable removal  constant propagation  whole program analysis  mod/ref analysis  array dimension padding  alias analysis  common block splitting  forward substitution  stack frame alignment  routine key-attribute propagation  structure splitting and field reordering  address-taken analysis  formal parameter alignment analysis  partial dead call elimination  C++ class hierarchy analysis  symbol table data promotion  indirect call conversion  common block variable coalescing  specialization  dead function elimination 中科信软高级技术培训中心-www.info-soft.cn 2
  • 3.
    PGO (Profiled-Guided Optimization) Profile-guided Optimization(PGO) improves application performance by reorganizing code layout to reduce instruction- cache problems, shrinking code size, and reducing branch mispredictions. PGO provides information to the compiler about areas of an application that are most frequently executed. By knowing these areas, the compiler is able to be more selective and specific in optimizing the application. 中科信软高级技术培训中心-www.info-soft.cn HLO (High-Level Optimization) High-level Optimizations (HLO) exploits the properties of source code constructs in applications developed in high-level programming languages. While the default optimization level, -O2 option, performs some high-level optimizations, specifying -O3 provides the best chance for performing loop transformations to optimize memory accesses.  Loop Permutation or Interchange  Predicate Optimization  Loop Distribution  Loop Reversal  Loop Fusion  Profile-Guided Loop Unrolling  Loop Unrolling  Loop Peeling  Data Prefetching  Data Transformation: Malloc  Scalar Replacement Combining and Memset Combining  Unroll and Jam  Loop Rerolling  Loop Blocking or Tiling  Memset and Memcpy Recognition  Partial-Sum Optimization  Statement Sinking for Creating Perfect Loopnests 中科信软高级技术培训中心-www.info-soft.cn 3
  • 4.
    Truth from Graph Benchmark Tuning Curve with gcc/icc, on various optimization flags 中科信软高级技术培训中心-www.info-soft.cn Compiler optimization  GCC Optimization • http://www.linuxjournal.com/article/7269 • http://www.network-theory.co.uk/docs/gccintro/gccintro_49.html • http://gcc.gnu.org/wiki/LinkTimeOptimization • http://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html  Intel C++ Optimization(aka icc) • http://software.intel.com/sites/products/collateral/hpc/compilers/compiler_qrg 12.pdf • http://software.intel.com/file/6282 • http://software.intel.com/sites/products/documentation/hpc/composerxe/en- us/cpp/lin/index.htm • http://software.intel.com/en-us/articles/performance-tools-for-software- developers-intel-compiler-options-for-sse-generation-and-processor-specific- optimizations/ 中科信软高级技术培训中心-www.info-soft.cn 4
  • 5.