Successfully reported this slideshow.
Upcoming SlideShare
×

# Parallel algorithms

1,771 views

Published on

• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

• Be the first to like this

### Parallel algorithms

1. 1. Parallel algorithms Parallel and Distributed Computing Wrocław, 07.05.2010 Paweł Duda
2. 2. Parallel algorithm – definition A parallel algorithm is an algorithm that has been specifically written for execution on a computer with two or more processing units.
3. 3. Parallel algorithms <ul><li>can be run on computers with single processor </li></ul><ul><li>(multiple functional units, pipelined functional units, pipelined memory systems) </li></ul>
4. 4. Modelling algorithms 1 <ul><li>when designing algorithm, take into account the cost of communication, the number of processors (efficiency) </li></ul><ul><li>designer usually uses an abstract model of computation called parallel random-access machine ( P RAM) </li></ul><ul><li>each CPU operation = one step </li></ul><ul><li>model’s advantages </li></ul>
5. 5. Modelling algorithms 2 - PRAM <ul><li>neglects such isses as synchronisation and communication </li></ul><ul><li>no limit on the number of processors in the machine </li></ul><ul><li>any memory location is uniformely accessible from any processor </li></ul><ul><li>no limit on the amount of shared memory in the system </li></ul>
6. 6. Modelling algorithms 3 - PRAM <ul><li>no conflict in accessing resources </li></ul><ul><li>generally the programs written on those machines are MIMD </li></ul>
7. 7. Multiprocessor model
8. 8. Parallel Algorithms <ul><li>Multiprocessor model </li></ul>
9. 9. Work-depth model How the cost of the algorithm can be calculated? Work - W Depth - D P = W/D – PARALLELISM of the algorithm Picture: Summing 16 numbers on a tree.The total depth (longest chain of dependencies) is 4 and The total work (number of operations) is 15.
10. 10. Mergesort <ul><li>Conceptually, a merge sort works as follows: </li></ul><ul><li>input: sequence of n keys </li></ul><ul><li>output: sorted sequence of n keys </li></ul><ul><li>If the list is of length 1, then it is already sorted. </li></ul><ul><li>Otherwise: </li></ul><ul><li>Divide the unsorted list into two sublists of about half the size. </li></ul><ul><li>Sort each sublist recursively  by re-applying merge sort. </li></ul><ul><li>Merge the two sublists back into one sorted list. </li></ul>
11. 11. Mergesort
12. 12. General-purpose computing on graphics processing units (GPGPU) <ul><li>General-purpose computing on graphics processing units (GPGPU) - recent trend </li></ul><ul><li>GPUs co-processors </li></ul><ul><li>linear algebra matrix operations </li></ul>Nvidia's Tesla GPGPU card
13. 13. Matrix multiplication <ul><li>Algorithm: MATRIX_MULTIPLY(A,B) </li></ul><ul><li>1 (l,m) := dimensions (A) </li></ul><ul><li>2 (m,n) := dimensions (B) </li></ul><ul><li>3 in parallel for i ∊ [o..l) do </li></ul><ul><li>4 in parallel for j ∊ [0..n) do </li></ul><ul><li>5 R ij := sum( { A ik * B kj : k ∊ [0..m) } ) </li></ul><ul><li>We need log n matrix multiplications, each taking time O(n3) </li></ul><ul><li>The serial complexity of this procedure is O(n 3 log n). </li></ul>
14. 14. Search <ul><li>Dynamic creation of tasks and channels during program execution </li></ul><ul><li>Looking for nodes coresponding to ‘solutions’ </li></ul><ul><li>Initially a task created for the root of the tree </li></ul>procedure search(A) begin if(solution(A)) then score = eval(A); report solution and score else foreach child A(i) of A search (A(i)) endfor endif end
15. 15. Shortest-Path Algorithms <ul><li>The all-pairs shortest-path problem involves finding the shortest path between all pairs of vertices in a graph. </li></ul><ul><li>A graph  G=(V,E) comprises a set  V  of  N  vertices {v i }  , and a set  </li></ul><ul><li>E ⊆ V x X  of edges. </li></ul><ul><li>For (v i , v j ) and (v i ,v j ), i ≠ j </li></ul>Picture:   A simple directed graph,  G , and its adjacency matrix,  A .
16. 16. Floyd’s algorithm Floyd’s algorithm is a graph analysis algorithm for finding shortest paths in a weighted graph . A single execution of the algorithm will find the shortest paths between  all  pairs of vertices.
17. 17. parallel Floyd’s algorithm 1 <ul><li>Parallel Floyd ’s algorithm 1 </li></ul><ul><li>The first parallel Floyd algorithm is based on a one-dimensional, ro w wise domain decomposition of the intermediate matrix  I  and the output matrix  S . </li></ul><ul><li>the algorithm can use at most  N processors. </li></ul><ul><li>Each task has one or more adjacent rows of  I  and is responsible for performing computation on those rows. </li></ul>
18. 18. parallel Floyd’s algorithm 1 Parallel version of Floyd's algorithm based on a one-dimensional decomposition of the I matrix. In (a) , the data allocated to a single task are shaded: a contiguous block of rows. In (b) , the data required by this task in the k th step of the algorithm are shaded: its own block and the k th row.
19. 19. parallel Floyd’s algorithm 2 <ul><li>Parallel Floyd ’s algorithm 2 </li></ul><ul><li>An alternative parallel version of Floyd's algorithm uses a two-dimensional decomposition of the various matrices. </li></ul><ul><li>This version allows the use of up to N 2   processors </li></ul>
20. 20. parallel Floyd’s algorithm 2 Parallel Floyd 2 Parallel version of Floyd's algorithm based on a two-dimensional decomposition of the I matrix. In (a), the data allocated to a single task are shaded: a contiguous submatrix. In (b), the data required by this task in the k th step of the algorithm are shaded: its own block, and part of the k th row and column.
21. 21. Thank you for attention