Section D
Global Scalar Optimization II
         Main-OPT
Three Categories of
Optimization related to
Dependencies (Re-Cap)
      1. Delete useless computations
         • Dead sto...
Partial Redundancy Elimination
Partial redundancy − Computation that is redundant in
  some path of execution
Insertion of...
PRE is Better than Loop-invariant Code Motion

                               a+b




              a←                 a←
...
Partial Redundancy Elimination Algorithm
Lazy Code Motion is the best PRE algorithm:
1. Computationally optimal – no other...
SSAPRE Motivation
Full redundancy starts at first computation and transitions to
  partial redundancy at dominance frontie...
Using SSA to solve redundancy problems

 Construct SSA form for expressions by using
  hypothetical temporary h to model ...
SSAPRE Algorithm
SSA construction
1. Φ insertion
1. Rename
Data flow analysis
3. DownSafety
4. WillBeAvail
Transform progr...
Step 1: Φ Insertion

As in SSA for variables:
                                a1+b1 [h]
 Dominance frontiers of
  express...
Step 2: Rename
Assigns SSA versions to h as in SSA
 renaming
Additional rule:
 Same SSA version in h only if identical SSA...
Step 3: DownSafety
Can insert only at Φ ’s where expression is
  anticipated (downsafe)

    Needed for computation optim...
Example

    a1+b1 [h1←]                   a1+b1 [h2←]

    ds=1   h3 = Φ (h1 ,┴)           h4 = Φ (h2 ,┴)ds=1
           ...
Step 3: WillBeAvail
Purpose: Identify Φ ’s for PRE insertion
Use two forward propagations along use-def
  edges
Part 1: fo...
Example

    a1+b1 [h1←]                   a1+b1 [h2←]

    ds=1   h3 = Φ (h1 ,┴)            h4 = Φ (h2 ,┴)ds=0
   cba=1  ...
Step 3: WillBeAvail (continued)
Part 2: for live range optimality
Find latest insertion points
Among can_be_avail Φ ’s:
 ...
Example

    a1+b1 [h1←]                    a1+b1 [h2←]

     ds=1   h3 = Φ (h1 ,┴)           h4 = Φ (h2 ,┴)ds=0
   cba=1 ...
Another example

              1
                  a1+b1 [h1←]                     2



                   ds=0 h     = Φ ...
Insertion Criteria


WillBeAvail = can_be_avail Λ not later

At WillBeAvail Φ ’s, insert at a Φ operand if either:
 Opera...
Step 5: Finalize


Shape h into real SSA form:
 For each real occurrence,
       Set reload flag if occurrence is redund...
Step 6: CodeMotion


Introduce real temporary t
Transform program
SSA form for t translated from SSA form for h
SSAPRE’s Practical Implementation


Iterate through work list of lexically identical
    expressions and apply SSAPRE algo...
PRE naturally extends to Loads or
Indirect Loads

Load PRE should take advantage of LHS Occurrences

                     ...
Dead Store Elimination can be viewed as
Redundancy in L-values
In L-value Redundancy:
    Use kills the l-value
    rend...
Store Partial Redundancy Elimination (SPRE)

 Result is Partial Dead Store Elimination
 Based on inverted CFG
 Requires St...
Example of Partial Dead Store Elimination



         a←


                                     a←
              a        ...
Compare with this earlier example for PRE

                               a+b




              a←                 a←
    ...
Register Promotion
    (Register Variable Identification)
Identify register allocation candidates by promoting scalar vari...
Speculative Partial Redundancy Elimination
Applicable to branches inside loops




                             a+b




 E...
Example of Register Promotion
a is a memory location
                         r←a          r←a


              LPRE       ...
Global Value Numbering-based Redundancies

 PRE recognizes redundancies among lexically identical
   expressions
 Redundan...
Value-number-based Full Redundancy
Elimination (VNFRE)
Expressions involved in PRE do not compute the same
  value
 
    ...
Strength Reduction
Induction expression – linear functions of induction variable
Strength reduction replaces computation o...
Linear Function Test Replacement
Replace use of induction variables in termination tests by
   strength-reduced induction ...
Combined Effects of Main-OPT Phases
Combining effects of:
 PRE
 Strength reduction
 Linear function test replacement
 f...
Main-opt Phase Structure
1.        Lower bit-field code (opt_revise_ssa.cxx)
2.        Expression PRE (opt_etable.cxx)
   ...
Upcoming SlideShare
Loading in …5
×

D Wopt Ii

564 views
495 views

Published on

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
564
On SlideShare
0
From Embeds
0
Number of Embeds
143
Actions
Shares
0
Downloads
21
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

D Wopt Ii

  1. 1. Section D Global Scalar Optimization II Main-OPT
  2. 2. Three Categories of Optimization related to Dependencies (Re-Cap) 1. Delete useless computations • Dead store elimination 1. Eliminate redundant computations • Common sub-expression • Loop-invariant code motion • Partial redundancy elimination 1. Re-order computations • Loop transformations • Instruction scheduling This session will cover 2.
  3. 3. Partial Redundancy Elimination Partial redundancy − Computation that is redundant in some path of execution Insertion of computation in non-redundant path yields full redundancy Fully redundant computation can then be removed Full redundancy Partial redundancies a+b a+b a+b a+b a+b
  4. 4. PRE is Better than Loop-invariant Code Motion a+b a← a← a+b a+b
  5. 5. Partial Redundancy Elimination Algorithm Lazy Code Motion is the best PRE algorithm: 1. Computationally optimal – no other compile-time code placement can produce fewer computations at run-time  Regardless of path taken during execution 1. Live range optimal – no other computationally optimal code placement produces shorter live range for the temporaries introduced No bi-directional data flow needed SSAPRE is same algorithm applied to SSA graphs  Sparse version of lazy code motion  More intuitive for understanding the algorithm
  6. 6. SSAPRE Motivation Full redundancy starts at first computation and transitions to partial redundancy at dominance frontiers a+b fully redundant dominance region frontiers partially redundant region
  7. 7. Using SSA to solve redundancy problems  Construct SSA form for expressions by using hypothetical temporary h to model problem  Def of h – compute expression and save the result  Use of h – reload earlier saved result  Same SSA version implies same value computed  Insertion only required at incoming edges at Φ ‘s  PRE data flow analysis performed on SSA graph
  8. 8. SSAPRE Algorithm SSA construction 1. Φ insertion 1. Rename Data flow analysis 3. DownSafety 4. WillBeAvail Transform program 5. Finalize SSA 6. CodeMotion
  9. 9. Step 1: Φ Insertion As in SSA for variables: a1+b1 [h]  Dominance frontiers of expression h = Φ (h ,h) Additional rule: a1+b1 [h]  Changes to variable values captured by φ ’s of a2 = variables a3 = φ(a1,a2) h = Φ (h,h)
  10. 10. Step 2: Rename Assigns SSA versions to h as in SSA renaming Additional rule: Same SSA version in h only if identical SSA versions of variables a1+b1 [h1] h = Φ (h, ┴) b2 = a1+b1 [h1] a1+b1 [h] a1+b2 [h2]
  11. 11. Step 3: DownSafety Can insert only at Φ ’s where expression is anticipated (downsafe)  Needed for computation optimality Initialize all Φ ’s to downsafe Mark Φ ’s that can reach exit without any occurrence not-downsafe (boundary condition) Propagate not-downsafe attribute backward along use-def edges Exclude use-def edges that have real occurrences (marked via has-real-use)
  12. 12. Example a1+b1 [h1←] a1+b1 [h2←] ds=1 h3 = Φ (h1 ,┴) h4 = Φ (h2 ,┴)ds=1 a1+b1 [h3] ds=0 has_real_use=1 h5 = Φ (h3 , ds=0 h4) a1+b1 [h5] exit
  13. 13. Step 3: WillBeAvail Purpose: Identify Φ ’s for PRE insertion Use two forward propagations along use-def edges Part 1: for computational optimality  can_be_avail gives possible points for insertion for computational optimality  Initialize all Φ ’s to can_be_avail  Mark Φ ’s that are not downsafe and have at least one ┴ operand not-can_be_avail (boundary condition)  Propagate not-can_be_avail attribute forward along def-use edges to not downsafe nodes
  14. 14. Example a1+b1 [h1←] a1+b1 [h2←] ds=1 h3 = Φ (h1 ,┴) h4 = Φ (h2 ,┴)ds=0 cba=1 cba=0 a1+b1 [h3] h5 = Φ (h3 , ds=0 cba=1 cba=0 h4) a1+b1 [h5] exit
  15. 15. Step 3: WillBeAvail (continued) Part 2: for live range optimality Find latest insertion points Among can_be_avail Φ ’s:  later means can be postponed  not-later will be latest insertion point Key idea: if Φ operand defined by real computation, insertion cannot be postponed past that Φ without introducing new redundancy  Mark can_be_avail Φ ’s that have any operand defined by real computation not-later (boundary condition)  Propagate not-later attribute forward along def-use edges
  16. 16. Example a1+b1 [h1←] a1+b1 [h2←] ds=1 h3 = Φ (h1 ,┴) h4 = Φ (h2 ,┴)ds=0 cba=1 cba=0 later=0 a1+b1 [h3] h5 = Φ (h3 , ds=0 cba=0 h4) a1+b1 [h5] exit
  17. 17. Another example 1 a1+b1 [h1←] 2 ds=0 h = Φ (h1 ,┴) cba=0 3 3 a2 = 4 exit 5 ds=1 cba=1 a3 = Φ (a1 ,a2 , a3) later=1 h5 = Φ (h3 ,┴ , h5) 6 If later not considered, placement would be performed at blocks 4 and 5 a3+b1 [h5] 7 exit
  18. 18. Insertion Criteria WillBeAvail = can_be_avail Λ not later At WillBeAvail Φ ’s, insert at a Φ operand if either:  Operand is ┴, or  has_real_use is false and it is defined by a Φ that is not WillBeAvail
  19. 19. Step 5: Finalize Shape h into real SSA form:  For each real occurrence,  Set reload flag if occurrence is redundant  Set save flag if computation needs to be saved  Perform insertion for Φ operands marked insert  Get rid of Φ ’s with undefined operands
  20. 20. Step 6: CodeMotion Introduce real temporary t Transform program SSA form for t translated from SSA form for h
  21. 21. SSAPRE’s Practical Implementation Iterate through work list of lexically identical expressions and apply SSAPRE algorithm Sufficient to work on height-1 expressions No redundancy in sub-tree implies no redundancy in ancestral part of tree − redundancy redundancy in (a+b) − in (t-c) + c u t c a b
  22. 22. PRE naturally extends to Loads or Indirect Loads Load PRE should take advantage of LHS Occurrences t← *p ← t ← *p *p ← t *p t
  23. 23. Dead Store Elimination can be viewed as Redundancy in L-values In L-value Redundancy:  Use kills the l-value  renders the store non-dead  Earlier stores are to be deleted  Redundancy problem in opposite direction Expr redundancy L-value redundancy a+b a← a+b a←
  24. 24. Store Partial Redundancy Elimination (SPRE) Result is Partial Dead Store Elimination Based on inverted CFG Requires Static Single Use form (SSU)  Inverse Φ ’s placed at branches  Currently implemented via pattern matching on SSA form
  25. 25. Example of Partial Dead Store Elimination a← a← a a a←
  26. 26. Compare with this earlier example for PRE a+b a← a← a+b a+b
  27. 27. Register Promotion (Register Variable Identification) Identify register allocation candidates by promoting scalar variables to pseudo-registers (TNs in CG)  Register allocation with infinite number of registers Trivial for local variables with no alias (address not taken)  Local RVI just works on symbol table For other variables, problem translates to placement optimization of loads and stores Register promotion is: 1. PRE of loads, followed by 2. PRE of stores Performing LPRE first creates more opportunities for SPRE
  28. 28. Speculative Partial Redundancy Elimination Applicable to branches inside loops a+b Effective register promotion requires speculation
  29. 29. Example of Register Promotion a is a memory location r←a r←a LPRE SPRE a← r← r← a←r a r r a←r
  30. 30. Global Value Numbering-based Redundancies PRE recognizes redundancies among lexically identical expressions Redundancies can arise from non-lexically identical expressions Global value numbering identifies non-identical expressions that compute the same value GVN algorithm based on Taylor Simpson’s paper After GVN, remove redundancies among expressions with same value number
  31. 31. Value-number-based Full Redundancy Elimination (VNFRE) Expressions involved in PRE do not compute the same value  Φ ’s merge different values at confluence points Only full redundancies possible among expressions computing same values VNFRE eliminates full redundancies among expressions with same GVN Use SSAPRE algorithm specialized to full redundancies Example: for (i=1, j=1; i <= n; i*=2, j*=2) ; return i == j; becomes return 1;
  32. 32. Strength Reduction Induction expression – linear functions of induction variable Strength reduction replaces computation of induction expression by induction increments Reverse of Loop Canonicalization Definition: induction expressions are injured by updates to the induction variable Approach: 1. apply PRE to induction expressions while disregarding the injuries 2. after PRE, repair the injuries by updating the temporaries that store the induction expressions t = i *4 i *4 t i=i+1 i=i+1 t=t+4 i*4 t
  33. 33. Linear Function Test Replacement Replace use of induction variables in termination tests by strength-reduced induction expressions Purpose: to render original induction variable dead Induction expressions are linear functions of the induction variable Performed during SSAPRE of induction expressions  Instances of LFTR opportunities represented as COMP_OCCUR in SSA graph  Replacement possible when induction expression is available at point of comparison  Adjust r.h.s. of comparison according to form of induction expression as a function of the induction variable Example During SSAPRE of (i*4), (i<n) changed to (t<n*4) New expression (n*4) becomes new PRE candidate
  34. 34. Combined Effects of Main-OPT Phases Combining effects of:  PRE  Strength reduction  Linear function test replacement for (i = 0; i <= 99; i++) a[i] = 0; becomes for (i = 0,p = &a; p <= &a[99]; i++,p++) *p = 0; After dead store elimination: for (p = &a; p <= &a[99]; p++) *p = 0;
  35. 35. Main-opt Phase Structure 1. Lower bit-field code (opt_revise_ssa.cxx) 2. Expression PRE (opt_etable.cxx)  Strength reduction (opt_estr.cxx)  Code hoisting (opt_ehoist.cxx)  Linear function test replacement (opt_lftr2.cxx) 1. Value-number-based Full Redundancy Elimination (opt_vn.cxx) 2. Dead code elimination (opt_dce.cxx) 3. Register promotion: a. Local register variable identification (opt_sym.cxx) b. Load PRE (opt_ltable.cxx)  Live range shrinking a. Store PRE (opt_stable.cxx) 4. Bitwise dead code elimination (opt_bdce.cxx) 5. Emit WHIRL (opt_htable_emit.cxx)

×