Detecting and enhancing loop
level parallelism
With large dependence distance,
more potential of parallelism is
obtained by loop unrolling.
Longer distances may provide the
enough parallelism to keep the
processor busy.
docsity.com
Back Substitution
Back Substitution increases the
amount of parallelism, but sometimes it
also increases the amount of
computation required
These techniques can be applied both:
- within a basic block; and
- within a loop
docsity.com
Eliminating dependent computations
Within a basic block:
Here, algebraic simplifications of
expressions and an optimization
is used.
This called copy propagation
It eliminates operations that copy
values
docsity.com
Eliminating dependent computations
For example; copy propagation of
DADDUI R1,R2,#4
DADDUI R1,R1,#4
Results into
DADDUI R1,R2,#8
Here, computations are eliminated to
remove dependence
docsity.com
Eliminating dependent computations
Optimization:
Tree-Height Reduction Technique
It is also possible to increase the
parallelism of the code by
possibly increasing the number of
operations.
Such optimization is called tree
height reduction
docsity.com
Eliminating dependent computations
For example, the code sequence
ADD R1,R2,R3
ADD R4,R1,R6
ADD R8,R1,R7
requires three cycles for execution
Because, here all the instructions depend
on immediate predecessor and cannot be
issued in parallel
docsity.com
Eliminating dependent computations
Now taking the advantage of the
associatively, the code can be transformed
and written in the from shown as below,
ADD R1,R2,R3
ADD R4,R6,R7
ADD R8,R1,R4
This sequence can be computed in two
execution cycles by issuing first 2
instruction in parallel
docsity.com
Eliminating dependent computations
Recurrences are expressions
whose value in one iteration is
given by a function that depends
on the previous iteration.
Common type of recurrence occurs
in:
sum = sum + x;
docsity.com
Eliminating dependent
computations
Assuming an unroll loop with the
recurrence of five times.
If the value of x of these five
iterations be given by x1, x2, x3,
x4 and x5.
Then we can write the value of
sum at the end of each unroll as,
docsity.com
Eliminating dependent
computations
Sum = sum + x1 + x2 + x3 + x4 +
x5;
Unoptimizing the expressions
requires five dependent
operations.
And it can be rewritten as,
docsity.com
Eliminating dependent
computations
Sum = (( sum + x1) + ( x2 + x3)) + (
x4 + x5);
This can be evaluated in only three
dependent operations.
Recurrence also occurs from implicit
calculations.
With unrolling the dependent
computations can be minimised.
docsity.com

Unit v detecting-and-enhancing-loop-level-parallelism-advance-computer-architecture-lecture-slides

  • 1.
    Detecting and enhancingloop level parallelism With large dependence distance, more potential of parallelism is obtained by loop unrolling. Longer distances may provide the enough parallelism to keep the processor busy. docsity.com
  • 2.
    Back Substitution Back Substitutionincreases the amount of parallelism, but sometimes it also increases the amount of computation required These techniques can be applied both: - within a basic block; and - within a loop docsity.com
  • 3.
    Eliminating dependent computations Withina basic block: Here, algebraic simplifications of expressions and an optimization is used. This called copy propagation It eliminates operations that copy values docsity.com
  • 4.
    Eliminating dependent computations Forexample; copy propagation of DADDUI R1,R2,#4 DADDUI R1,R1,#4 Results into DADDUI R1,R2,#8 Here, computations are eliminated to remove dependence docsity.com
  • 5.
    Eliminating dependent computations Optimization: Tree-HeightReduction Technique It is also possible to increase the parallelism of the code by possibly increasing the number of operations. Such optimization is called tree height reduction docsity.com
  • 6.
    Eliminating dependent computations Forexample, the code sequence ADD R1,R2,R3 ADD R4,R1,R6 ADD R8,R1,R7 requires three cycles for execution Because, here all the instructions depend on immediate predecessor and cannot be issued in parallel docsity.com
  • 7.
    Eliminating dependent computations Nowtaking the advantage of the associatively, the code can be transformed and written in the from shown as below, ADD R1,R2,R3 ADD R4,R6,R7 ADD R8,R1,R4 This sequence can be computed in two execution cycles by issuing first 2 instruction in parallel docsity.com
  • 8.
    Eliminating dependent computations Recurrencesare expressions whose value in one iteration is given by a function that depends on the previous iteration. Common type of recurrence occurs in: sum = sum + x; docsity.com
  • 9.
    Eliminating dependent computations Assuming anunroll loop with the recurrence of five times. If the value of x of these five iterations be given by x1, x2, x3, x4 and x5. Then we can write the value of sum at the end of each unroll as, docsity.com
  • 10.
    Eliminating dependent computations Sum =sum + x1 + x2 + x3 + x4 + x5; Unoptimizing the expressions requires five dependent operations. And it can be rewritten as, docsity.com
  • 11.
    Eliminating dependent computations Sum =(( sum + x1) + ( x2 + x3)) + ( x4 + x5); This can be evaluated in only three dependent operations. Recurrence also occurs from implicit calculations. With unrolling the dependent computations can be minimised. docsity.com