2. Introduction
Loop distribution eliminates loop carried dependences by
executing the sources of all dependences before executing any
sinks.
Many carried dependencies are due to array alignment issues.
If we can align all references, then dependencies would go
away, and parallelism is possible.
For e.g.,
DO I = 2,N
A(I) = B(I)+C(I)
D(I) = A(I-1)*2.0
ENDDO
Created by Sumita Das
3. This loop cannot be run in parallel.
Because the value of A computed on iteration I is used on
iteration I+1.
The two statements can be aligned to compute and use the
values in the same iteration by adding an extra iteration and
adjusting the indices of one of the statement to produce
For e.g., DO I = 1,N+1
IF (I .GT. 1) A(I) = B(I)+C(I)
IF (I .LE. N) D(I+1) = A(I)*2.0
ENDDO
Created by Sumita Das
5. DO I = 2,N
J = MOD(I+N-4,N-1)+2
A(J) = B(J)+C
D(I)=A(I-1)*2.0
ENDDO
Alignment
Loop alignment does incur some overhead—
One extra loop iteration and extra work required to test the
conditionals.
This overhead can be reduced by executing the last iteration of
the first statement with the first iteration of the second
statement.
Created by Sumita Das
6. For every iteration other than the first, j is one less than i, so
that the assignment to A is for the ith location.
On the first iteration, j=N-1 so that j+1=N, and the assignment
to the last location of A is correctly executed.
As a result, the total number of loop iterations is restored to its
original count, but there is still the overhead of the MOD
calculation.
Created by Sumita Das
7. Alternatively, the conditional statements can be eliminated
without adding calls to MOD by peeling off the first and last
executions for each of the statements, yielding
This form permits efficient parallelism with the added
overhead of two statements, one before and one after the
loop, that cannot be executed in parallel.
D(2) = A(1) * 2.0
DO I= 2, N-1
A(I) = B(I) + C(I)
D(I+1) = A(I)*2.0
ENDDO
A(N) = B(N) + C(N)
Created by Sumita Das
8. It is not possible to use alignment to eliminate all carried
dependences in a loop if the carried dependence is involved in
a recurrence, as the following example shows:
DO I = 1, N
A(I) = B(I) + C
B(I+1) = A(I) + D
ENDDO
In this example, the references to B create a carried
dependence.
For alignment to be successful in this case, we would need to
interchange the order of the two statements in the loop body.
Created by Sumita Das
9. However, the loop-independent dependence involving A
prevents interchanging the statements before alignment, so our
hope is that we can do the alignment and statement interchange in
a single step to eliminate the carried dependence:
DO I = 1, N+1
IF (I .NE. 1) B(I) = A(I-1) + D
IF (I .NE. N+1) A(I) = B(I) + C
ENDDO
Although B is now aligned, the references to A are misaligned,
creating a new carried dependence.
Looking at this example, it is reasonable to believe that loop
alignment cannot eliminate carried dependences in a recurrence.
Created by Sumita Das
10. Alignment, replication, and statement reordering are
sufficient to eliminate all carried dependencies in a single
loop containing no recurrence, and in which the distance of
each dependence is a constant independent of the loop
index
We can establish this constructively.
Let G = (V,E,) be a weighted graph. v V is a
statement, and (v1, v2) is the dependence distance
between v1 and v2. Let o: V Z give the offset of
vertices.
G is said to be carry free if o(v1) + (v1, v2) = o(v2).
Theorem
Created by Sumita Das
11. The carried dependences that are not involved in a recurrence
cannot be always eliminated by alignment without introducing
new carried dependences?
Because of the possibility of an alignment conflict—two or
more dependences that cannot be simultaneously aligned.
Consider the following example:
DO I = 1, N
A(I+1) = B(I) + C
X(I)= A(I+1) + A(I)
ENDDO
This loop contains two dependences involving the array A, one
loop-independent dependence and a loop-carried dependence.
Created by Sumita Das
12. If the statements are aligned to eliminate the carried
dependence, the following code results:
DO I = 0, N
IF (I .NE. 0) A(I+1) = B(I) + C
IF (I .NE. N) X(I+1)= A(I+2) + A(I+1)
ENDDO
The original loop-carried dependence has been eliminated,
but the process of eliminating it has transformed the original
loop-independent dependence into a loop-carried dependence.
The loop still cannot be correctly run in parallel.
Created by Sumita Das
13. procedure Align(V,E,,0)
While V is not empty
remove element v from V
for each (w,v) E
if w V
W W {w}
o(w) o(v) - (w,v)
else if o(w) != o(v) - (w,v)
create vertex w’
replace (w,v) with (w’,v)
replicate all edges into w
onto w’
W W {w’}
o(w)’ o(v) - (w,v)
for each (v,w) E
if w V
W W {w}
o(w) o(v) + (v,w)
else if o(w) != o(v) + (v,w)
create vertex v’
replace (v,w) with (v’,w)
replicate edges into v onto v’
W W {v’}
o(v’) o(w) - (v,w)
end align
Alignment Procedure
Created by Sumita Das
14. References
Created by Sumita Das
[1] Randy Allen, Ken Kennedy”Optimizing Compilers for Modern
Architectures, Chapter 6: Creating Coarse-Grained Parallelism”,
1st Edition