• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
C Wopt I
 

C Wopt I

on

  • 923 views

 

Statistics

Views

Total Views
923
Views on SlideShare
723
Embed Views
200

Actions

Likes
0
Downloads
14
Comments
0

5 Embeds 200

http://www.lingcc.com 196
http://static.slidesharecdn.com 1
http://webcache.googleusercontent.com 1
http://translate.googleusercontent.com 1
http://cache.baiducontent.com 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    C Wopt I C Wopt I Presentation Transcript

    • Section C Global Scalar Optimization I Pre-OPT
    • Global Scalar Optimizer WOPT Overview  Works at function scope  Builds control flow graph  Performs alias analysis  Represents program in SSA form  SSA-based optimization algorithms  Co-operations among multiple phases to achieve final effects  Phase order designed to maximize effectiveness  Separated into Preopt and Mainopt  Pre-opt serves as pre-optimizing front-ends for LNO and IPA (in High WHIRL)  Provide use-def info to LNO and IPA  Provide alias info to CG
    • Topics of this session 1. Fundamentals of SSA 2. Some SSA-based optimizations 3. Alias analysis in WOPT 4. WOPT’s SSA representation 5. Representing aliasing in WOPT 6. Generalization of SSA optimizations to any memory operations 7. Optimization of sign- and zero-extension operations 8. Pre-optimizer phase structure 9. SSAPRE 10.Other redundancy-themed optimizations 11.Main-optimizer phase structure
    • What is SSA By name: Static Single Assignment form – only one definition allowed per variable over entire program By functionality: Program representation with built-in use-def dependency information Use-def – a unidirectional edge from each use to its definition
    • Use-def Dependencies in Straight-line Code a=  Each use must be defined by 1 and only 1 def a  Straight-line code trivially a single-assignment  Uses-to-defs: many-to-1 a= mapping  Each def dominates all a its uses
    • Use-def Dependencies in Non-straight-line Code a= a= a= Many uses to many defs  Overhead in representation  Hard to manage a Can recover the good a a properties in straight-line code by using SSA form
    • Factoring Operator φ Factoring – when multiple edges cross a join point, create a common node Φ that all edges must pass through  Number of edges reduced from 9 to 6 a= a= a=  AΦ is regarded as def (its parameters are uses)  Many uses to 1 def a = φ ( a,a,a)  Each def dominates all its uses (uses in Φ operands a regarded at predecessors) a a
    • Rename to represent use-def edges a1 = a2= a 3= • No longer necessary to represent the a4 = use-def edges φ ( a1,a2,a3) explicitly a4 a4 a4
    • Putting program into SSA form  Φ needed only at dominance frontiers of defs (where it stops dominating)  Dominance frontiers pre-computed based on control flow graph  Two phases: 1. Insert Φ ’s at dominance frontiers of each def (recursive) 2. Rename the uses to their defs’ name • Maintain and update stack of variable versions in pre-order traversal of dominator tree
    • Example Phase 1: a= Φ Insertion 1 Steps: def at BB 3 → Φ at BB 4 a = φ ( a,a) Φ def at BB 4 → Φ at BB 2 2 a= 3 a = φ ( a,a) 4
    • Example Phase 2: Rename a1 = a1 1 stack for a a = φ ( a,a1) dominator tree 2 1 a= 3 2 a = φ ( a,a) 4 3 4
    • Example Phase 2: Rename a1 = 1 a2 = φ ( a,a1) a2 dominator tree a1 2 1 a= 3 2 a = φ ( a2,a) 4 3 4
    • Example Phase 2: Rename a1 = 1 a2 = φ ( a,a1) dominator tree 2 a3 1 a2 a3 = 3 a1 2 a = φ ( a2,a3) 4 3 4
    • Example Phase 2: Rename a1 = 1 a2 = φ ( a4,a1) dominator tree 2 1 a3 = 3 2 a4 = φ ( a2,a3) a 4 4 a2 a1 3 4
    • First Application: Dead Store Elimination Steps: 1. Assume all defs are dead and all statements not required 2. Mark following statements required: a. Function return values b. Statements with side effects c. Def of global variables 3. Variables in required statements are live 4. Propagate liveness backwards iteratively through: a. use-def edges – when a variable is live, its def statement is made live b. control dependences
    • Control Dependence  Statements in branched-to blocks depend on the If (i < n) conditional branch  Equivalent to post- dominance frontier x= (dominance frontier of the inverted control flow graph)
    • Example of dead store elimination Propagation steps: 1. return s2 → s2 2. s2 → s2 = s3 * s3 i1 = s1 = 3. s3 → s3 = φ (s2,s1) 4. s1 → s1 = i3 = φ ( i2,i1) s3 = φ ( s2,s1) 5. return s2 → if (i2 < 10) [control dependence] i2 = i3 +1 s2 = s3 * s3 6. i2 → i2 = i3 + 1 if (i3 < 10) 7. i3 → i3 = φ(i2,i1) 8. i1 → i1 = return s2 Nothing is dead
    • Example of dead store elimination All statements not required; whole loop deleted i1 = s1 = i3 = φ ( i2,i1) s3 = φ ( s2,s1) i2 = i3 +1 empty s2 = s3 * s3 if (i3 < 10)
    • Restrictions on WOPT's SSA  Φ operands must be based on same variable  No constants  No expressions  Overlapped live ranges disallowed among versions of the same variable Motivation o Preserves utility of built-in use-defs o Prevent increase in register pressure o Trivial to translate out of SSA form o (just drop the Φ ‘s and SSA subscripts) Caught many optimization mistakes (e.g. SSA form not preserved)
    • Copy Propagation Follow use-def edge and replace variable by its r.h.s. value for non-Φ definitions If all uses replaced, def rendered dead Always beneficial if r.h.s. is leaf If r.h.s. is expression, create redundant computation; depends on later redundancy elimination to reverse back x1 = a1 + b1 x1 = a1 + b1 = x1 = a1 + b1
    • Important restrictions on copy propagation  Does not apply to Φ operands  Does not allow movement past another def of the same variable (no overlapped live range) Motivation  Preserves utility of built-in use-defs  Prevent increase in register pressure  Trivial to translate out of SSA form (just drop the Φ ‘s and SSA subscripts)
    • Example 1 x1 = a1 + b1 x1 = a1 + b1 a2 = a2 = = x1 = a1 + b1
    • Example 2 Dead φ ’s are needed to prevent over-propagation x1 = a1 + b1 x1 = a1 + b1 wrong a2 = a2 = a3 = φ ( a2,a1) a3 = φ ( a2,a1) return x1 return a1+ b1
    • Advantages of SSA-based optimizations Dependency information built-in  No separate phase required to compute dependency information Transformed output preserves SSA form  Little overhead to update dependencies Efficient algorithms due to:  Sparse occurrence of nodes  Complexity dependent only on problem size (independent of program size)  Linear data flow propagation along use-def edges  Can customize treatment according to candidate Can re-apply algorithms as often as needed No separation of local optimizations from global optimizations
    • Alias Analysis Overview Aliases are hidden defs and uses of data due to:  Accesses through pointers  Partial overlaps in storage (e.g. unions, equivalences)  Procedure calls  Raising of exceptions Aliased unless proven otherwise
    • Address-taken Analysis Refer to symbol’s address taken and saved to a pointer Symbols marked address-not-taken never aliased Done via a scan through the program Assume address-taken if entire scope of symbol not covered Most effective in IPA for non-locals Applied multiple times throughout compilation Current Limitation: Taking address of a structure member causes all its members to be address taken
    • Alias Analysis Phases First pass: Flow-free Analysis Relies on  User declarations (const, restrict)  Language rules  Results of alias classification phases (IPA, WOPT)  “Points-to Analysis in Almost Linear Time” by Bjarne Steensgaard, 1996 Construct SSA via annotated WHIRL Second pass: Flow-sensitive Analysis Looks up pointer values by following use-def At Φ ’s, combine results conservatively: • If bases disagree, mark no-fixed-base • Enlarge range of addresses • Take lower nesting level
    • Representation for Alias Information ALIAS_INFO provides (base, offset, size) and attributes  base can be fixed, dynamic or unknown  offset and offset+size gives lower and upper bound of accessed memory  assumes entire range if offset is variable Points-to analysis applies when base is dynamic POINTS_TO is ALIAS_INFO + TY_IDX
    • Language-independent alias rules Base rule: no alias if base symbols are fixed are not the same Offset rule: for same base symbol, no alias if no overlap Static nesting rule:  no alias if static nesting levels are different  symbols at higher nesting level always not accessible  symbols belonging to different functions at the same nesting levels do not alias Indirect rule:  Indirect memory accesses do not affect symbols whose addresses are not taken Call rule:  Call to function at level I can only affect symbols with nesting level < I Use attributes to advantage: const, unique_pt, pure function, no-side-effect function
    • C alias rules ANSI type rule:  enabled under -OPT:alias=typed  objects not aliased if they have different base types Qualfied type rule:  const and restrict qualifiers taken into account in matching types Restrict rule:  A pointer declared restrict is not aliased to another pointer declared restrict
    • FORTRAN alias rules Fortran parameter rule:  Fortran parameters are not aliased to anything  A Fortran call has no side effect on address-taken variables (except the parameters passed) Fortran pointers:  A pointer-based variable cannot itself be a pointer  A pointer-based variable cannot be a dummy argument or be used in COMMON, EQUIVALENCE, DATA or NAMELIST
    • Value Numbering  Technique to recognize when two expressions compute same value  Traditionally applied on per-basic-block basis  Value number vn is unique location in the hash table  Leaves are given vn's based on their unique data values  vn of op(opnd0, opnd1) is Hash-func(op, opnd0, opnd1) SSA enables value number to be applied globally
    • Global Value Numbering (GVN)  In SSA form, all occurrences of same variable have the same value  Each SSA variable can be given unique vn  Need only single node to represent each def and all its uses Defstmt field in node points to its defining statement  Unique node to represent all occurrences of the same expression tree E.g. a1+b1 and a1+b2 are different nodes while a1+3 and a1+3 are same node Trivial to test if two expressions are equivalent Storage can be minimized  Expression trees are now in form of DAGs made of coderep nodes GVN enables single node representation for indirect variables with identical address expression They can then be put into SSA form
    • Example Program statement: htable a[i] = i *= + i &a * &a i 4 stmtrep + opnd0 opnd1 istore i lhs rhs 4 * opnd0 opnd1 ivar opnd0 defstmt
    • Representing Aliasing Hidden defs and uses of scalars due to:  Procedure calls  Accesses through pointers  Partial overlaps in storage  Raising of exceptions  Procedure entries and exits (for non-locals)
    • Modeling use-defs under Aliasing Introduce new operators for:  MayDefs – χ (chi)  MayUses – µ (not a definition) g1 = Tag these nodes to existing nodes µ(g1) call foo() χ factors defs at MayDefs g2 = χ (g1) g2 Single assignment property preserved
    • Example a and b overlaid on top of d in memory a b c program SSA form a1 = a= c2 = χ (c1) b1 = b= c3 = χ (c2) µ(a1) µ(b1) c c3 a µ(c a1 3) µ(c ) b b1 3
    • SSA for indirectly accessed data To be consistent, all program data need to be represented in SSA form For occurrences of **(p+1), Naïve approach: 1. Put p into SSA form 2. Put *(pi+1) into SSA form among identical i’s 3. Put *[*(pi+1)]j into SSA form among idential j’s Problems: 1. A round of SSA construction for each level of indirection 2. No clue about relationship among related indirect variables, e.g. a[i] and a[i+1]
    • Introducing Virtual Variables Associate each indirect variable with an imaginary scalar variable with identical alias characteristics Virtual variables tagged to indirect variables via χ ’s and µ’s One pass SSA construction for both scalar and virtual variables Assignment of virtual variables:  Related indirect accesses should share same virtual variables, e.g. *p, *(p+1)  Flexible: Greater Less missed More virtual compilation optimization variables overhead opportunies
    • Virtual Variables Example Va[] is virtual variable for accesses to array a program SSA form a[i] = 3 a[i1] = 3 va[]2 = χ (va[]1 ) i=i+1 i2 = i1 + 1 a[i] = 4 a[i2] = 4 va[]3 = χ (va[]2 ) i=i-1 i3 = i2 - 1 µ(va[]3 ) return a[I] return a[i3] Possible to determine a[i1] and a[i3] are same by following use-def edges of Va[]
    • GVN for Indirect Variables Virtual variables only serve annotation purpose Additional condition for two indirect variables with same vn to be same coderep node:  They must be tagged with same virtual variable version Result: indirect variables are now in SSA form (single node for its def and all its uses)  Possible only under GVN Honor properties of indirect variables as both expressions and variables Work consistently for multiple levels of indirection
    • Example of HSSA (GVN form of SSA) HSSA form 1 SSA form 4 µ(va[]1 ) istore &a a[i1] = a[i1] + 1 lhs va[]2 = χ (va[]1 ) rhs + opnd0 opnd1 chi + opnd0 opnd1 µ(va[]2 ) res opnd0 return a[i1] * opnd0 opnd1 i return rhs Va[] Va[] defstmt ivar opnd0 mu ivar opnd0 mu defstmt
    • Program Representation in WOPT Two categories of program constructs: 1. Statements – have side effects – Can be reordered only if dependencies preserved – “stmtrep” nodes 1. Expression trees – no side effect  Contain only uses  Can be aggressively optimized  “coderep” nodes Linear list of statements in each BB Linear list of BBs in each PU preserves source order CFG represented by pred/succ in the BBs Expression trees hung from statement nodes
    • Zero SSA Versions Objective: Reduce SSA representation overhead with minimal impact on optimization results Ideas  Give up complete use-def chains in presence of MayDefs  Use special versions, called zero versions:  Designate incomplete use-def information  Do not conform to single assignment property SSA and non-SSA versions can co-exist Volatile variables only have zero versions
    • Determining Zero Versions Definition: Real occurrence – any occurrence in the program before constructing SSA form • For virtual variables, real occurrences at occurrences of its indirect variables Definition: Zero versions are versions of variables that have no real occurrence, and whose values are derived from at least one χ with 0 or more intervening Φ ‘s Motivation: introduce zero versions only in presence of χ ‘s Use-def edges are incomplete at zero versions For dead store elimination: a Φ or χ whose result is zero version cannot be deleted
    • Zero Versioning Algorithm Two rounds of SSA construction: First round:  Represented in WHIRL nodes  SSA version stored in ST_IDX field of WN  ver_stab is table of SSA versions (VER_STAB_ENTRY)  Use WHIRL maps to associate occurrence nodes (OCC_TAB_ENTRY) to hold µ’s and χ ’s
    • Zero Versioning Algorithm (continued) In first round’s SSA : Out of versions without real occurrences:  If defined by χ , make zero version  If defined by Φ , make zero version if any of the Φ operands is zero version Use worklist of Φ ‘s to iterate until no more change Second round of SSA construction (HSSA):  Only 1 coderep node created for the zero version of each variable
    • Zero Version Example x1 = x1 = x3 = φ ( x2,x1) x0 = φ ( x0,x1) *p = *p = x4 = χ (x3 ) x0 = χ (x0 ) x2 = φ ( x3,x4) x0 = φ ( x0,x0) µ (x2 ) µ (x0 )
    • Dead Indirect Store Elimination void foo(void) { int i, a[40]; i1 = for (i=0; i<40; i++) a[i] = i; i3 = φ ( i2,i1) va[]3 = φ ( va[]2,va[]1) return; } a[i3] = i3; va[]2 = χ (va[]3 ) i2 = i3 +1 va[] has no use If (i3 < 40) Entire loop deleted Return
    • Dead Indirect Store Elimination Straight application of SSA dead store elimination algorithm will not identify many dead indirect stores (va[] does not represent a single location) a[i ] = 3; 1 va[]2 = χ (va[]1 ) Need to enhance algorithm by performing analysis along va[] 's use-def a[i1+1] = 4; va[]3 = χ (va[]2 ) chain µ(va[]3 ) return a[i1];
    • Copy Propagation through Indirect Variables Based on defstmt pointer of indirect variable nodes Replace indirect variable by r.h.s. a[i1] = 3; of defining statement va[]2 = χ (va[]1 ) Can propagate more than the a[i1+1] = 4; closest def by following va[] 's use- va[]3 = χ (va[]2 ) def chain: µ(va[]3 ) µ(va[]3 ) 1. Address expression must be return a[i1] + a[i1+1]; identical 2. Verify non-overlap of intervening indirect stores
    • Generalization of SSA Form Any constructs that access memory can be represented in SSA form At high levels of representation: • Array aggregates • Composite data structures Structs Classes (objects) C++ templates At low levels of representation: Bit-fields Can apply SSA-based optimization algorithms to them
    • Optimizations of structs and fields  Large struct copies often lowered to loops making their optimization difficult  Apply SSA optimization before struct lowering:  Dead store elimination of struct copies  Copy propagation for structs  Take into account aliasing with field accesses  Apply SSA optimization again after lowering to fields
    • Optimizations for struct aggregates typedef struct ss { int f1; int f2; int f3; } S; S a; Copy propagation and dead store elimination before struct lowering: { S b; { S b; b = a; return a; return b; } }
    • Optimizations for fields Copy propagation and dead store elimination after lowering structs to fields: { S b; { S b; { S b; b.f1 = a.f1; b.f1 = a.f1; b = a.; b.f2 = a.f2; b.f3 = a.f3; b.f2 = 99; b.f3 = a.f3; b.f2 = 99; return b; b.f2 = 99; return b; } return b; } }
    • Optimizations of bit-fields  Bit-fields can be optimized more aggressively as individual fields  SSA optimizations applied before fields are lowered to extract/deposit: • Less associated aliasing due to smaller footprints • Same representation as scalars  After lowering to extract/deposit: • Promote word-wise accesses to register to minimize memory accesses • Redundancy elimination among masking operations
    • Sign and Zero Extension Optimizations Motivation: 1. Sign/zero extension operations needed when integer size smaller than operation size 2. Also show up when user performs: • Casting • Truncation Especially important for Itanium: • Only unsigned loads provided • Mostly 64-bit operations in ISA (majority of operations in programs are 32-bit)
    • Sign/Zero Extension Operations Definitions: sext n – sign bit is at bit n-1; all bits at position n and higher set to sign bit zext n – unsigned integer of size n; all bits at position n and higher set to zero Example: k = sext 16 register short i, j, k; + k = i + j; i j (zext if unsigned)
    • Sign Extension Elimination Algorithm An extension to SSA-based dead code elimination algorithm  (perform dead code elimination simultaneously) Use a liveness bit mask for each variable (instead of a single flag) Use a liveness bit mask for each expression tree node Two phases: 1. Propagate liveness of individual bits backward through use-defs, computation edges and control dependences 2. Delete operations [Implementation in be/opt/opt_bdce.cxx]
    • Propagation of bit liveness phase  Top-down propagation in expression trees (from operation result to its operands)  Based on semantics of operation, only the bits of the operand that affect the result made LIVE  At leaves, follow use-def edges to the def statements of SSA variables  Propagation stops when no new liveness found
    • Deletion of useless operations Pass over entire program:  Assignment statements: delete if bit mask of SSA variable has no live bit  Other statements: delete if required flag not set  Zero/sign extension operations: delete in either of following 2 cases: – Dead bits – Affected bits are dead – Redundant extension – Affected bits already have said values
    • Operations where Dead Bits Arise  Bit-wise AND with constant: bits AND’ed with 0 are dead  Bit-wise OR with constant: bits OR’ed with 1 are dead  EXTRACT_BITS and COMPOSE_BITS  “sext n (opnd)” and “zext n (opnd)”: bits of opnd higher than n are dead  Right shifts: right bits of operand shifted out are dead  Left shifts: left bits operand shifted out are dead  Others
    • Redundant Extension Operations Given “sext n (opnd)” or “zext n (opnd)” Cases where the sign/zero extension can be determined redundant: 1. opnd is small integer type with size <= n (known values for higher bits) 2. opnd is integer constants 3. opnd is load of memory location of size <= n 4. opnd is another sign/zero extension operation with length <= n 5. opnd is SSA variable: following use-def to its definition and analyse its r.h.s. recursively
    • Loop Canonicalization Example for (p = &a; p <= &a[99]; p++) *p = 0; becomes for (i = 0; I <= 99; i++) a[i] = 0;
    • Loop Canonicalization Effected through combined effects of different phases 1. Loop normalization  Insert an artificial primary induction variable for every loop  Start at 0  Stride 1 and increasing 1. Induction variable canonicalization a) Detect all induction variables in each loop  Based on SSA graphs a) Pick the primary IV among the induction variables b) For each secondary IV, insert assignment at start of loop body expressing it in terms of the primary IV 2. Copy propagation  All uses of secondary IVs replaced by primary IV 1. Dead Store Elimination  All secondary IVs occurrences eliminated
    • Three Categories of Optimization related to Dependencies 1. Delete useless computations • Dead store elimination 1. Eliminate redundant computations • Common sub-expression • Loop-invariant code motion • Partial redundancy elimination 1. Re-order computations • Loop transformations • Instruction scheduling SSA provides solution for 1 and 2. We have covered 1 today. 2 belongs to Main-OPT.
    • Optimizations Performed by WOPT Pre-optimizer Main optimizer  Goto conversion  Partial redundancy elimination  Loop normalization based on SSAPRE framework  Alias analysis (flow-free and o Global common subexpression flow-sensitive) o Loop invariant code motion  Tail recursion elimination o Strength reduction  Dead store elimination o Linear function test replacement  Induction variable  Value-number-based full canonicalization redundancy elimination  Copy propagation  Induction variable elimination  Dead code elimination  Register promotion  Compute def-use chains for  Bitwise dead store elimination LNO and IPA  Pass alias info to CG
    • Pre-opt Implementation Flow 1. Goto conversion (opt_goto.cxx) 2. Loop normalization (opt_loop.cxx:Normalize_loop()) 3. Create optimizer symbol table (opt_sym.cxx) 4. Alias classification (opt_alias_class.cxx) 5. Create CFG (opt_cfg.cxx) 6. Control flow analysis (opt_cfg.cxx)  Dominators  Dominance frontiers  Post-dominiators  Post-dominance frontiers  Unreachable code  If-conversion  Represent loop structures 1. Tail recursion elimination (opt_tail.cxx) 2. Flow-free alias analysis (opt_alias_analysis.cxx:Compute_FFA()) 3. Construct WHIRL-based SSA (opt_ssa.cxx) 4. Flow-sensitive alias analysis (opt_alias_analysis.cxx:Compute_FSA())
    • Pre-opt Phase Structure (continued) 11. Dead store elimination (opt_dse.cxx) 12. Find zero versions (opt_dse.cxx) 13. Create HSSA (stmtrep, coderep) (opt_htable.cxx) 14. Induction variable canonicalization (opt_ivr.cxx) 15. Copy propagation (opt_prop.cxx) 16. Fold indirect variables to direct (opt_revise_ssa.cxx) 17. Dead code elimination (opt_dce.cxx) 18. Iterate a. Control flow transformation (opt_cfg_trans.cxx) b. Update SSA (opt_rename.cxx) c. Copy propagation (opt_prop.cxx) d. Fold indirect variables to direct (opt_revise_ssa.cxx) e. Dead code elimination (opt_dce.cxx) until no more change 19. If IPA or LNO  Emit WHIRL (opt_emit.cxx)  Create DU info (opt_du.cxx)