Parallelizing matrix multiplication

•Download as PPTX, PDF•

0 likes•78 views

DEEPIKA T

Compiler Design

Education

MATRIX MULTIPLICATION
• Let’s consider arbitrary matrices A and B.
Since the matrices are square matrices
n=m=p.
• So, the resultant matrix AB can be obtained
like this,
• (AB)ij = ∑ Aik Bkj.

GENERATE RANDOM SQUARE MATRIX
• Let’s get into implementation by creating random
matrices for multiplication. Here we are using
malloc function to allocate memory dynamically
at heap. Because when it comes to testing we
have to deal with matrices with different
dimensions.
• Here we have defined the data type as double
which can be changed according to the use case.
The #pragma omp parallel for statement will do
the loop parallelization which we can initialize the
matrix more efficiently.

TRADITIONAL MATRIX
MULTIPLICATION
• Without considering much about the
performance, the direct implementation of
the matrix multiplication is given below.
• Operations will occur in sequential manner for
each element at resultant matrix.
• Here matrix A and matrix B are input matrices
where matrix C is the resultant matrix. So, we
have to pass the resultant matrix into function
as a reference.

MATRIX MULTIPLICATION USING
PARALLEL FOR LOOPS
• When you are going to implement loop
parallelization in your algorithm, you can use a
library like OpenMP to make the hard work
easy or you can use your own implementation
with threads where you have to handle load
balancing, race conditions etc.

PROGRAM
double parallelMultiply(TYPE** matrixA,TYPE** matrix
B,TYPE** matrixC,int dimension)
{
struct timeval t0,t1;
gettimeofday(&t0,0);
#pragma omp parallel for
for(int i=0;i<dimension;i++){
for(int j=0;j<dimension;j++){
for(int k=0;k<dimension;k++){
matrixC[i][j] +=matrixA[i][k] * matrixB[k][j];

}
}
}
gettimeofday(&t1,0);
double elapsed= (t1.tv_sec-t0.tv_sec) * 1.0f +
(t1.tv_ usec-t0.tv_usec)/ 1000000.0f;
return elapsed;
}

OPTIMIZED MATRIX MULTIPLICATION
USING PARALLEL FOR LOOPS
• Since, our matrices are stored in heap, it is not
easy to access them as they stored in the
stack. It is not easy to access them as they
stored in the stack. It is better to bring those
data from heap to stack before start the
multiplication process. So, we need to set
containers initially for those data.
• TYPE flatA[MAX_DIM];
• TYPE flatB[MAX_DIM];

STEPS OF OPTIMIZED MATRIX
MULTIPLICATION IMPLEMENTATION
1.Put common calculation at one place:
Most of the time we do not consider small
calculations that redundant over the program
where performance is not required but clarity
is.
2. Cache friendly algorithm implementation:
We all know that memory has a linear
arrangement. So, every N-dimensional array

Cont…
ordered sequentially inside the memory .
3. Using stack vs Heap memory efficiently:
• It is fast access stack rather than heap
memory. But stack has limited memory. We
have stored the large input memories in heap
memory. For efficient intermediate
calculations we have used the stack with
predefined memory allocations.

Cont…
• Here we have launched 40 threads to do the
multiplication process. Since we are dealing
with dimensions of 200, 400, 600, 800, 1000,
1200, 1400, 1600, 1800 and 2000, workload
can be divided equally among threads.
• In omp we have explicitly declared that
matrixC as shared resource to avoid race
conditions.

What's hot

Sccd and topological sortingAmit Kumar Rathi

upgrade2013Rhian Davies

Deep Learning for Computer Vision: Software Frameworks (UPC 2016)Universitat Politècnica de Catalunya

Machine Learning Basics for Web Application DevelopersEtsuji Nakai

Rendering of Complex 3D Treemaps (GRAPP 2013)Matthias Trapp

Deep Learning for Computer Vision: Unsupervised Learning (UPC 2016)Universitat Politècnica de Catalunya

2.5D Clip-Surfaces for Technical VisualizationMatthias Trapp

CenternetArithmer Inc.

Machine learning session 9NirsandhG

MBrace: Cloud Computing with F#Eirik George Tsarpalis

DataXDay - Tensors in the sky with CloudML DataXDay Conference by Xebia

MBrace: Large-scale cloud computation with F# (CUFP 2014)Eirik George Tsarpalis

Geometry Batching Using Texture-ArraysMatthias Trapp

Summarizing videos with AttentionArithmer Inc.

Image Segmentation ChainRMwebsite

Webinar on Graph Neural NetworksLucaCrociani1

Ds36715716IJERA Editor

Andrew Goldberg. Highway Dimension and Provably Efficient Shortest Path Algor...Computer Science Club

What is the point of Boson sampling?University of Glasgow Optical Sciences Seminar

distance_matrix_chvikasveshishth

What's hot (20)

Sccd and topological sorting

upgrade2013

Deep Learning for Computer Vision: Software Frameworks (UPC 2016)

Machine Learning Basics for Web Application Developers

Rendering of Complex 3D Treemaps (GRAPP 2013)

Deep Learning for Computer Vision: Unsupervised Learning (UPC 2016)

2.5D Clip-Surfaces for Technical Visualization

Centernet

Machine learning session 9

MBrace: Cloud Computing with F#

DataXDay - Tensors in the sky with CloudML

MBrace: Large-scale cloud computation with F# (CUFP 2014)

Geometry Batching Using Texture-Arrays

Summarizing videos with Attention

Image Segmentation Chain

Webinar on Graph Neural Networks

Ds36715716

Andrew Goldberg. Highway Dimension and Provably Efficient Shortest Path Algor...

What is the point of Boson sampling?

distance_matrix_ch

Similar to Parallelizing matrix multiplication

Compiler Designsweetysweety8

Basic MATLAB-Presentation.pptxPremanandS3

Evaluation of programs codes using machine learningVivek Maskara

Fosdem2017 Scientific computing on JrubyPrasun Anand

Dsp manual completed2bilawalali74

Variables in matlabTUOS-Sam

Quantization and Training of Neural Networks for Efficient Integer-Arithmetic...Ryo Takahashi

L 5 Numpy final ppt kirti.pptxKirti Verma

International Journal of Engineering Research and Development (IJERD)IJERD Editor

MatlabNaatchammai Ramanathan

Neural networksHarshitGupta367

Using Sparse Matrix for the Contact Calculation_ZhanWangZhan Wang

"Optimization of a .NET application- is it simple ! / ?", Yevhen TatarynovFwdays

lec08-numpy.pptxlekha572836

COMPANION TO MATRICES SESSION II.pptximman gwu

Matlabsandhya jois

VCE Unit 01 (1).pptxskilljiolms

Options and trade offs for parallelism and concurrency in Modern C++Satalia

Accelerating Key Bioinformatics Tasks 100-fold by Improving Memory AccessIgor Sfiligoi

Similar to Parallelizing matrix multiplication (20)

Compiler Design

Basic MATLAB-Presentation.pptx

Evaluation of programs codes using machine learning

Fosdem2017 Scientific computing on Jruby

Dsp manual completed2

Variables in matlab

Quantization and Training of Neural Networks for Efficient Integer-Arithmetic...

L 5 Numpy final ppt kirti.pptx

International Journal of Engineering Research and Development (IJERD)

Matlab

Neural networks

Using Sparse Matrix for the Contact Calculation_ZhanWang

"Optimization of a .NET application- is it simple ! / ?", Yevhen Tatarynov

lec08-numpy.pptx

COMPANION TO MATRICES SESSION II.pptx

Matlab

VCE Unit 01 (1).pptx

Options and trade offs for parallelism and concurrency in Modern C++

Accelerating Key Bioinformatics Tasks 100-fold by Improving Memory Access

Recently uploaded

Danh sách HSG Bộ môn cấp trường - Cấp THPT.pdfQucHHunhnh

Industrial Training Report- AKTU Industrial Training ReportAvinash Rai

Instructions for Submissions thorugh G- Classroom.pptxJheel Barad

Embracing GenAI - A Strategic ImperativePeter Windle

Basic_QTL_Marker-assisted_Selection_Sourabh.pptSourabh Kumar

Basic phrases for greeting and assisting costumersPedroFerreira53928

The Roman Empire A Historical Colossus.pdfkaushalkr1407

How libraries can support authors with open access requirements for UKRI fund...Jisc

PART A. Introduction to Costumer ServicePedroFerreira53928

Supporting (UKRI) OA monographs at Salford.pptxJisc

NLC-2024-Orientation-for-RO-SDO (1).pptxssuserbdd3e8

Additional Benefits for Employee Website.pdfjoachimlavalley1

B.ed spl. HI pdusu exam paper-2023-24.pdfSpecial education needs

Introduction to Quality Improvement EssentialsExcellence Foundation for South Sudan

Matatag-Curriculum and the 21st Century Skills Presentation.pptxJenilouCasareno

UNIT – IV_PCI Complaints: Complaints and evaluation of complaints, Handling o...Sayali Powar

Solid waste management & Types of Basic civil Engineering notes by DJ Sir.pptxDenish Jangid

How to Split Bills in the Odoo 17 POS ModuleCeline George

1.4 modern child centered education - mahatma gandhi-2.pptxJosvitaDsouza2

How to Create Map Views in the Odoo 17 ERPCeline George

Recently uploaded (20)

Danh sách HSG Bộ môn cấp trường - Cấp THPT.pdf

Industrial Training Report- AKTU Industrial Training Report

Instructions for Submissions thorugh G- Classroom.pptx

Embracing GenAI - A Strategic Imperative

Basic_QTL_Marker-assisted_Selection_Sourabh.ppt

Basic phrases for greeting and assisting costumers

The Roman Empire A Historical Colossus.pdf

How libraries can support authors with open access requirements for UKRI fund...

PART A. Introduction to Costumer Service

Supporting (UKRI) OA monographs at Salford.pptx

NLC-2024-Orientation-for-RO-SDO (1).pptx

Additional Benefits for Employee Website.pdf

B.ed spl. HI pdusu exam paper-2023-24.pdf

Introduction to Quality Improvement Essentials

Matatag-Curriculum and the 21st Century Skills Presentation.pptx

UNIT – IV_PCI Complaints: Complaints and evaluation of complaints, Handling o...

Solid waste management & Types of Basic civil Engineering notes by DJ Sir.pptx

How to Split Bills in the Odoo 17 POS Module

1.4 modern child centered education - mahatma gandhi-2.pptx

How to Create Map Views in the Odoo 17 ERP

Parallelizing matrix multiplication

1. PARALLELIZING MATRIX MULTIPLICATION IN COMPILER DESIGN BY M.SRI NANDHINI, II- MSC(CS), NADAR SARASWATHI COLLEGE OF ARTS & SCIENCE, VADAPUTHUPATTI, THENI.

2. MATRIX MULTIPLICATION • Let’s consider arbitrary matrices A and B. Since the matrices are square matrices n=m=p. • So, the resultant matrix AB can be obtained like this, • (AB)ij = ∑ Aik Bkj.

3. GENERATE RANDOM SQUARE MATRIX • Let’s get into implementation by creating random matrices for multiplication. Here we are using malloc function to allocate memory dynamically at heap. Because when it comes to testing we have to deal with matrices with different dimensions. • Here we have defined the data type as double which can be changed according to the use case. The #pragma omp parallel for statement will do the loop parallelization which we can initialize the matrix more efficiently.

4. TRADITIONAL MATRIX MULTIPLICATION • Without considering much about the performance, the direct implementation of the matrix multiplication is given below. • Operations will occur in sequential manner for each element at resultant matrix. • Here matrix A and matrix B are input matrices where matrix C is the resultant matrix. So, we have to pass the resultant matrix into function as a reference.

5. MATRIX MULTIPLICATION USING PARALLEL FOR LOOPS • When you are going to implement loop parallelization in your algorithm, you can use a library like OpenMP to make the hard work easy or you can use your own implementation with threads where you have to handle load balancing, race conditions etc.

6. PROGRAM double parallelMultiply(TYPE** matrixA,TYPE** matrix B,TYPE** matrixC,int dimension) { struct timeval t0,t1; gettimeofday(&t0,0); #pragma omp parallel for for(int i=0;i<dimension;i++){ for(int j=0;j<dimension;j++){ for(int k=0;k<dimension;k++){ matrixC[i][j] +=matrixA[i][k] * matrixB[k][j];

7. } } } gettimeofday(&t1,0); double elapsed= (t1.tv_sec-t0.tv_sec) * 1.0f + (t1.tv_ usec-t0.tv_usec)/ 1000000.0f; return elapsed; }

8. OPTIMIZED MATRIX MULTIPLICATION USING PARALLEL FOR LOOPS • Since, our matrices are stored in heap, it is not easy to access them as they stored in the stack. It is not easy to access them as they stored in the stack. It is better to bring those data from heap to stack before start the multiplication process. So, we need to set containers initially for those data. • TYPE flatA[MAX_DIM]; • TYPE flatB[MAX_DIM];

9. STEPS OF OPTIMIZED MATRIX MULTIPLICATION IMPLEMENTATION 1.Put common calculation at one place: Most of the time we do not consider small calculations that redundant over the program where performance is not required but clarity is. 2. Cache friendly algorithm implementation: We all know that memory has a linear arrangement. So, every N-dimensional array

10. Cont… ordered sequentially inside the memory . 3. Using stack vs Heap memory efficiently: • It is fast access stack rather than heap memory. But stack has limited memory. We have stored the large input memories in heap memory. For efficient intermediate calculations we have used the stack with predefined memory allocations.

11. Cont… • Here we have launched 40 threads to do the multiplication process. Since we are dealing with dimensions of 200, 400, 600, 800, 1000, 1200, 1400, 1600, 1800 and 2000, workload can be divided equally among threads. • In omp we have explicitly declared that matrixC as shared resource to avoid race conditions.

12. THANK YOU!!!

Parallelizing matrix multiplication

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Parallelizing matrix multiplication

Similar to Parallelizing matrix multiplication (20)

More from DEEPIKA T

More from DEEPIKA T (20)

Recently uploaded

Recently uploaded (20)

Parallelizing matrix multiplication