Compiler Design

Nadar Saraswathi College Of Arts
and Science,Theni
Parallelizing Matrix Multiplication
Compiler Design
S.Subha Thilagam
II msc(cs)

Matrix Multiplication
• Let’s consider arbitrary matrices A and B. Since
the matrices are square matrices n = m = p.
• So, the resultant matrix AB can be obtained like
this,

Generate Random Square Matrix
• Let’s get into implementation by creating random
matrices for multiplication. Here we are
using malloc function to allocate memory dynamically
at heap. Because when it comes to testing we have to
deal with matrices with different dimensions.
• Here we have defined the data type as double which
can be changed according to the use case.
The #pragma omp parallel for statement will do the
loop parallelization which we can initialize the matrix
more efficiently.

Traditional Matrix Multiplication
• Without considering much about the performance, the
direct implementation of the matrix multiplication is
given below.
• Operations will occur in sequential manner for each
element at resultant matrix.
• Here matrixA and matrixB are input matrices
where matrixC is the resultant matrix. So, we have to
pass the resultant matrix into the function as a
reference.

Matrix Multiplication Using Parallel
For Loops
• When you are going implement loop parallelization in
your algorithm, you can use a library like OpenMP to
make the hardwork easy or you can use your own
implementation with threads where you have to handle
load balancing, race conditions etc.
• For this tutorial I am going to stick with the OpenMP
library.
• We just need to add several lines to make this thing
parallel.

double parallelMultiply(TYPE** matrixA, TYPE** matrixB, TYPE**
matrixC, int dimension){
struct timeval t0, t1;
gettimeofday(&t0, 0);
#pragma omp parallel for
for(int i=0; i<dimension; i++){
for(int j=0; j<dimension; j++){
for(int k=0; k<dimension; k++){
matrixC[i][j] += matrixA[i][k] * matrixB[k][j];
}
}
}
gettimeofday(&t1, 0);
double elapsed = (t1.tv_sec-t0.tv_sec) * 1.0f + (t1.tv_usec - t0.tv_usec) /
1000000.0f;
return elapsed;
}

Optimized Matrix Multiplication
Using Parallel For Loops
• Since, our matrices are stored in heap, it is not easy to
access them as they stored in the stack. It is better to
bring those data from heap to stack before start the
multiplication process. So, we need to set containers
initially for those data.
• TYPE flatA[MAX_DIM];
TYPE flatB[MAX_DIM];

Steps of optimized matrix multiplication
implementation is given below,
1.Put common calculation at one place
Most of the time we do not consider small calculations
that redundant over the program where performance is
not required but clarity is.

2. Cache friendly algorithm
implementation
• We all know that memory has a linear arrangement. So,
every N-dimensional array ordered sequentially inside
the memory. In this example we can convert 2
dimensional input matrices into row major and column
major 1 dimensional matrices.

3. Using Stack Vs Heap Memory
efficiently
• It is fast access stack rather than heap memory. But
stack has limited memory. We have stored the large
input memories in heap memory. For efficient
intermediate calculations we have used the stack with
predefined memory allocations.
• Here we have launched 40 threads to do the
multiplication process.Since we are dealing with
dimensions of 200, 400, 600, 800, 1000, 1200, 1400,
1600, 1800 and 2000, workload can be divided equally
among threads.
• In omp we have explicitly declared that matrixC as
shared resource to avoid race conditions.

Compiler Design

More Related Content

What's hot

Similar to Compiler Design

More from sweetysweety8

Recently uploaded

Compiler Design