Successfully reported this slideshow.

Parallelizing matrix multiplication

0

Share

Upcoming SlideShare
Cs 331 Data Structures
Cs 331 Data Structures
Loading in …3
×
1 of 13
1 of 13

More Related Content

Related Books

Free with a 14 day trial from Scribd

See all

Related Audiobooks

Free with a 14 day trial from Scribd

See all

Parallelizing matrix multiplication

  1. 1. PARALLELIZING MATRIX MULTIPLICATION IN COMPILER DESIGN BY M.SRI NANDHINI, II- MSC(CS), NADAR SARASWATHI COLLEGE OF ARTS & SCIENCE, VADAPUTHUPATTI, THENI.
  2. 2. MATRIX MULTIPLICATION • Let’s consider arbitrary matrices A and B. Since the matrices are square matrices n=m=p. • So, the resultant matrix AB can be obtained like this, • (AB)ij = ∑ Aik Bkj.
  3. 3. GENERATE RANDOM SQUARE MATRIX • Let’s get into implementation by creating random matrices for multiplication. Here we are using malloc function to allocate memory dynamically at heap. Because when it comes to testing we have to deal with matrices with different dimensions. • Here we have defined the data type as double which can be changed according to the use case. The #pragma omp parallel for statement will do the loop parallelization which we can initialize the matrix more efficiently.
  4. 4. TRADITIONAL MATRIX MULTIPLICATION • Without considering much about the performance, the direct implementation of the matrix multiplication is given below. • Operations will occur in sequential manner for each element at resultant matrix. • Here matrix A and matrix B are input matrices where matrix C is the resultant matrix. So, we have to pass the resultant matrix into function as a reference.
  5. 5. MATRIX MULTIPLICATION USING PARALLEL FOR LOOPS • When you are going to implement loop parallelization in your algorithm, you can use a library like OpenMP to make the hard work easy or you can use your own implementation with threads where you have to handle load balancing, race conditions etc.
  6. 6. PROGRAM double parallelMultiply(TYPE** matrixA,TYPE** matrix B,TYPE** matrixC,int dimension) { struct timeval t0,t1; gettimeofday(&t0,0); #pragma omp parallel for for(int i=0;i<dimension;i++){ for(int j=0;j<dimension;j++){ for(int k=0;k<dimension;k++){ matrixC[i][j] +=matrixA[i][k] * matrixB[k][j];
  7. 7. } } } gettimeofday(&t1,0); double elapsed= (t1.tv_sec-t0.tv_sec) * 1.0f + (t1.tv_ usec-t0.tv_usec)/ 1000000.0f; return elapsed; }
  8. 8. OPTIMIZED MATRIX MULTIPLICATION USING PARALLEL FOR LOOPS • Since, our matrices are stored in heap, it is not easy to access them as they stored in the stack. It is not easy to access them as they stored in the stack. It is better to bring those data from heap to stack before start the multiplication process. So, we need to set containers initially for those data. • TYPE flatA[MAX_DIM]; • TYPE flatB[MAX_DIM];
  9. 9. STEPS OF OPTIMIZED MATRIX MULTIPLICATION IMPLEMENTATION 1.Put common calculation at one place: Most of the time we do not consider small calculations that redundant over the program where performance is not required but clarity is. 2. Cache friendly algorithm implementation: We all know that memory has a linear arrangement. So, every N-dimensional array
  10. 10. Cont… ordered sequentially inside the memory . 3. Using stack vs Heap memory efficiently: • It is fast access stack rather than heap memory. But stack has limited memory. We have stored the large input memories in heap memory. For efficient intermediate calculations we have used the stack with predefined memory allocations.
  11. 11. Cont… • Here we have launched 40 threads to do the multiplication process. Since we are dealing with dimensions of 200, 400, 600, 800, 1000, 1200, 1400, 1600, 1800 and 2000, workload can be divided equally among threads. • In omp we have explicitly declared that matrixC as shared resource to avoid race conditions.
  12. 12. THANK YOU!!!

×