G Ipa
Upcoming SlideShare
Loading in...5

G Ipa






Total Views
Views on SlideShare
Embed Views



2 Embeds 41

http://www.lingcc.com 40
https://www.mturk.com 1



Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

G Ipa G Ipa Presentation Transcript

  • Section G Inter-Procedural Analysis and Optimizations
  • Roles of IPA The only optimization component operating at program scope  Analysis: collect information from entire program  Optimization: performs optimizations across procedure boundaries  Depends on later phases for full optimization effects  Supplies cross-file information for later optimization phases
  • IPA Flow
  • Preparation for IPA Pre-IPA (IPL) phase:  Extension of front-end for a compilation unit  Calls VHO followed by PREOPT  Summarizes pre-optimized WHIRL information Fake .o file has new ELF sections:  WHIRL section  Summary information section Source FE WHIRL IPL .o WHIRL + summary info
  • Summary Information Purpose: IPA does not need to pass over WHIRL Information stored for each PU:  Statistics on PU contents  Feedback information per PU  Formal parameters  Assignments to globals  Call sites in PU  Actual paramaters
  • Summary Information (continued)  Constant symbols and their values  IPA-relevant symbols and their mod/ref  SSA graph for relevant symbols  Selected simple expressions and statements  Control dependences  Common blocks and dimensions  Struct access information See ipa/local/ipl_summary.h
  • Main IPA Phase Overview Use the pre-linker approach  IPA phase built into ipa.so  ipa.so linked with GNU ld called ipa_link Mode of operation: 1. Pass over all input files:  Perform symbol resolution  Read in summary information 1. Inter-procedural analysis  Works only on summary information 1. Inter-procedural optimization  Reads in and modifies WHIRL 1. Emit output
  • Symbol Resolution ipa_link and ld (without –ipa) see same objects in link line Identical symbol resolution rules between ipa_link and ld Global symbols from all files merged to single global symbol table • Single unified global symbol table in rest of compilation • Stored in symtab.I
  • Symbol Table Merging Tables not specific to a PU have to be merged: Top level driver is IPC_merge_global_tab [common/ipc_symtab_merge.cxx] Tables created to map old indices in old tables to their new indices Order of merging: 1. String tables 2. Types 3. TCONs 4. Symbols 5. INITOs and INITVs 6. ST_ATTR (symbol attributes)
  • Simple alias and mod/ref analysis Mod/ref analysis for globals and formal ref parameters:  Performed in backward (post-order) pass over the IPA call graph  Propagate direct references or modification to callers (upward in the call graph) Indirect reference analysis:  Performed in forward (pre-order) pass over the IPA call graph  Propagate addresses of symbols passed as actual parameters to callees (mainly for Fortran)  Better alias information when reference parameters are accessed in callees
  • Inter-procedural Constant Propagation Only user of mod/ref information in IPA Resolve formal parameters to possible constant values among calls to current function Information propagated when formals passed on as actuals in another call with no mod inside function In optimization phase: • If formal resolved to single constant, delete parameter and replace by constant • Dead code elimination by folding of conditional branches
  • Cloning Applicable when a formal parameter resolves to possible constant values Formal parameter replaced by a constant value in each cloned copy Cloning is more profitable if formal parameter used in:  array references  loop bounds  branch conditions Cloning not performed when:  Alternative entry  Nested or contained nested PUs  PU is not hot  Size of PU too large
  • PU Re-ordering Determine best placement of PU’s in IPA’s output WHIRL files Possible only if optimization options consistent across files Based on feedback data Two modes supported: 1. Based on node frequency (-IPA:pu_reorder=1) 2. Based on call-edge frequency (-IPA:pu_reorder=2)  Operates on a transient undirected version of call graph  While there are still call edges in the graph: a. Pick call edge with highest frequency and emit caller-callee pair b. Merge emitted caller-callee into single node c. Update frequencies of call edges to/from merged node
  • Alias Class Analysis Based on Steensgaard’s “Points-to Analysis in Almost Linear Time” (1996) Objective: partition data objects into alias classes such that aliases only occur among objects within each class Algorithm: 1. At start, assume each object not aliased with anything • Each object assigned its own unique alias class number 1. Pass over each statement in program • If semantics of statement may cause two objects to alias, merge their alias classes (e.g. p = q implies *p aliases with *q) Effective only if all occurrences are covered  Assume aliasing otherwise Intra-procedural version repeated in WOPT for locals Results used only in WOPT’s alias analysis
  • Phase Structure 1. Merge symbol tables [common/ipc_symtab_merge.cxx] 2. Position-independent code (PIC) optimizations [common/ipc_pic.cxx] Analysis: 1. Common block padding [main/analyze/ipa_pad.cxx] 2. Build call graph [main/analyze/ipa_cg.cxx] 3. Global dead variable elimination and constant global variable identification [common/ipc_pic.cxx] 4. Structure field reordering legality analysis [main/analyze/ipa_reorder.cxx] 5. Dead function elimination [main/analyze/ipa_cg.cxx] 6. Simple alias and mod/ref analysis [main/analyze/ipaa.cxx] 7. Cloning analysis [main/analyze/ipa_section_prop.cxx] 8. Inter-procedural constant propagation [main/analyze/ipa_cprop.cxx] 9. Dead code elimination [main/analyze/ipa_cprop_annot.cxx] 10. Inlining analysis [main/analyze/ipa_inline.cxx]
  • Phase Structure (continued) Optimization:: 1. Common block padding [main/optimize/ipo_pad.cxx] 2. Common block splitting [main/optimize/ipo_split.cxx] 3. Structure field re-ordering [main/optimize/ipa_reorder.cxx] 4. For each PU p in post-order traversal of call graph: [main/optimize/ipo_main.cxx] a. Read PU’s WHIRL [common/ipc_bread.cxx] b. Icall optimization [main/optimize/ipo_icall.cxx] c. Update code for common block padding [main/optimize/ipo_pad.cxx] d. Update code for global constant propagation [main/optimize/ipo_const.cxx] e. Update code for structure field re-ordering [main/optimize/ipa_reorder.cxx] f. Cloning [main/analyze/ipa_cg.cxx] g. Propagate constants into function arguments [main/optimize/ipo_const.cxx]
  • Phase Structure (continued) Optimization (continued):: h. For each call q in p: 1) Update if call eliminated by dead code elimination [ipa/main/optimize/ipo_dce.cxx] 2) Otherwise, if inlining (but not recursively), inline the call [ipa/main/optimize/ipo_inline.cxx] 3) Dead function elimination of q if no longer called [ipa/main/optimize/ipo_main.cxx] h. Recursive inlining [main/optimize/ipo_inline.cxx] i. Output p to .I file [common/ipc_bwrite.cxx]  Use new file if last file big enough 5. PU re-ordering to optimize for I-cache [main/optimize/ipo_main.cxx] 6. Alias class analysis [main/optimize/ipo_alias_class.cxx] 7. Output results of alias class analysis as a WHIRL map [main/optimize/ipo_main.cxx] 8. Output merged global symbol table [common/ipc_bwrite.cxx]
  • Output of ipa_link 1. Global symbol table for entire program in symtab.I 2. Rest of PUs plus their local symbol tables in 1.I, 2.I, etc. 3. Subdirectory to contain the .I files 4. Makefile in subdirectory 5. linkopt file for the final (real) ld invocation 6. cmdfile to invoke make using Makefile
  • Post-IPA Compilation Effected via make in subdirectory created by ipa_link: 1. symtab.I compiled into symtab.o, symtab.G spit out 2. Rest of .I files compiled into .o files, referring to symtab.G for the global symbol tables  Can use parallel make 1. Invoke final ld using arguments in file linkopt