Upcoming SlideShare
×

# Lec2 final

588 views

Published on

Published in: Technology
0 Likes
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

• Be the first to like this

Views
Total views
588
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
15
0
Likes
0
Embeds 0
No embeds

No notes for slide
• I thought it was interesting enough to point out that the term Systolic actually refers to Systole which essentially means the “rhythmic beating of the heart.” This is appropriate because of the way a systolic array fans in values and relies on them being in a specific place at a specific time so that they can be “pumped” through the system.
• ### Lec2 final

1. 1. Systolic Arrays & Their Applications
2. 2. Overview What it is N-body problem Matrix multiplication Cannon’s method Other applications Conclusion
3. 3. Introduction – “Systolic”
4. 4. What Is a Systolic Array?A systolic array is an arrangement of processors in an arraywhere data flows synchronously across the array betweenneighbors, usually with different data flowing in different directionsEach processor at each step takes in data from one or moreneighbors (e.g. North and West), processes it and, in the nextstep, outputs results in the opposite direction (South and East).H. T. Kung and Charles Leiserson were the first to publisha paper on systolic arrays in 1978, and coined the name.
5. 5. What Is a Systolic Array? A specialized form of parallel computing. Multiple processors connected by short wires. Unlike many forms of parallelism which lose speed through their connection. Cells(processors), compute data and store it independently of each other.
6. 6. Systolic Unit(cell) Each unit is an independent processor. Every processor has some registers and an ALU. The cells share information with their neighbors, after performing the needed operations on the data.
7. 7. Some simple examples of systolic array models.
8. 8. N-body Problem Conventional method: N2 Systolic method: N We will need one processing element for each body, lets call them planets.
9. 9. B2 B3B1 B4 B6 B5Six planets in orbit with different masses, and different forcesacting on each other, so are systolic array will look like this.
10. 10. Each processor will hold the newtonian force formula:Fij = kmimj/d2ij , k being the gravitational constant.Then load the array with values. B1 B2 B3 B4 B5 B6 Now that the array has the mass and the coordinates of Each body we can begin are computation.
11. 11. B6, B5, B4, B3, B2, B1B1m B2m B3m B4m B5m B6mB1c B2c B3c B4c B5c B6c Fij = kmimj/d2ij
12. 12. B6, B5, B4, B3, B2B1m B2m B3m B4m B5m B6*B1B1c B2c B3c B4c B5c Fij = kmimj/d2ij
13. 13. B6, B5, B4, B3B1m B2m B3m B4m B5*B1 B6*B2B1c B2c B3c B4c Fij = kmimj/d2ij
14. 14. B6, B5, B4B1m B2m B3m B4*B1 B5*B2 B6*B3B1c B2c B3c Fij = kmimj/d2ij
15. 15. B6, B5B1m B2m B3*B1 B4*B2 B5*B3 B6*B4B1c B2c Fij = kmimj/d2ij
16. 16. B6B1m B2*B1 B3*B2 B4*B3 B5*B4 B6*B5B1c Fij = kmimj/d2ij
17. 17. Done Done Done Done Done Done Fij = kmimj/d2ij
18. 18. Matrix Multiplicationa11 a12 a13 b11 b12 b13 c11 c12 c13a21 a22 a23a31 a32 a33 * b21 b22 b23 b31 b32 b33 = c21 c22 c23 c31 c32 c33Conventional Method: N3For I = 1 to N For J = 1 to N For K = 1 to N C[I,J] = C[I,J] + A[J,K] * B[K,J];
19. 19. Systolic MethodThis will run in O(n) time!To run in N time we need N x N processing units, in this casewe need 9. P1 P2 P3 P4 P5 P6 P7 P8 P9
20. 20. We need to modify the input data, like so: a13 a12 a11Flip columns 1 & 3 a23 a22 a21 a33 a32 a31 b31 b32 b33 Flip rows 1 & 3 b21 b22 b23 b11 b12 b13 and finally stagger the data sets for input.
21. 21. b33 b32 b23 b31 b22 b13 b21 b12 b11 a13 a12 a11 P1 P2 P3 a23 a22 a21 P4 P5 P6a33 a32 a31 P7 P8 P9At every tick of the global system clock data is passed to eachprocessor from two different directions, then it is multipliedand the result is saved in a register.
22. 22. 3 4 2 3 4 2 23 36 282 5 33 2 5 * 2 5 3 3 2 5 = 25 39 34 28 32 37Lets try this using a systolic array. 5 2 3 3 5 2 2 4 3 2 4 3 P1 P2 P3 3 5 2 P4 P5 P6 5 2 3 P7 P8 P9
23. 23. Clock tick: 1 5 2 3 3 5 2 2 4 2 4 3*3 3 5 2 5 2 3 P1 P2 P3 P4 P5 P6 P7 P8 P99 0 0 0 0 0 0 0 0
24. 24. Clock tick: 2 5 2 3 3 5 2 2 4*2 3*4 3 5 2*3 5 2 3 P1 P2 P3 P4 P5 P6 P7 P8 P917 12 0 6 0 0 0 0 0
25. 25. Clock tick: 3 5 2 3 2*3 4*5 3*2 3 5*2 2*4 5 2 3*3 P1 P2 P3 P4 P5 P6 P7 P8 P923 32 6 16 8 0 9 0 0
26. 26. Clock tick: 4 5 2*2 4*3 3*3 5*5 2*2 5 2*2 3*4 P1 P2 P3 P4 P5 P6 P7 P8 P923 36 18 25 33 4 13 12 0
27. 27. Clock tick: 5 2*5 3*2 5*3 5*3 5*2 3*2 P1 P2 P3 P4 P5 P6 P7 P8 P923 36 28 25 39 19 28 22 6
28. 28. Clock tick: 6 3*5 5*2 2*3 P1 P2 P3 P4 P5 P6 P7 P8 P923 36 28 25 39 34 28 32 12
29. 29. Clock tick: 7 5*5 P1 P2 P3 P4 P5 P6 P7 P8 P923 36 28 25 39 34 28 32 37
30. 30. Same answer! In 2n + 1 time, can we do better? The answer is yes, there is an optimization. 23 36 28 25 39 34 28 32 37 P1 P2 P3 P4 P5 P6 P7 P8 P923 36 28 25 39 34 28 32 37
31. 31. Cannon’s Technique No processor is idle. Instead of feeding the data, it is cycled. Data is staggered, but slightly different than before. Not including the loading which can be done in parallel for a time of 1, this technique is effectively N.
32. 32. Other Applications Signal processing Image processing Solutions of differential equations Graph algorithms Biological sequence comparison Other computationally intensive tasks
33. 33. Samba: Systolic Accelerator for Molecular Biological ApplicationsThis systolic array contains 128 processors shared into 32 full customVLSI chips. One chip houses 4 processors, and one processor performs10 millions matrix cells per second.
34. 34. Why Systolic? Extremely fast. Easily scalable architecture. Can do many tasks single processor machines cannot attain. Turns some exponential problems into linear or polynomial time.
35. 35. Why Not Systolic? Expensive. Not needed on most applications, they are a highly specialized processor type. Difficult to implement and build.