SlideShare a Scribd company logo
1 of 10
Download to read offline
International Journal of Computer Science, Engineering and Information Technology (IJCSEIT), Vol.2, No.5, October 2012
DOI : 10.5121/ijcseit.2012.2506 55
Performance Analysis of Parallel Algorithms on
Multi-core System using OpenMP
Sanjay Kumar Sharma1
, Dr. Kusum Gupta2
1
Departemtne of Computer Science, Banasthali University, Rajasthan, India
skumar2.sharma@gmail.com
1
Departemtne of Computer Science, Banasthali University, Rajasthan, India
gupta_kusum@yahoo.com
Abstract
The current multi-core architectures have become popular due to performance, and efficient processing of
multiple tasks simultaneously. Today’s the parallel algorithms are focusing on multi-core systems. The
design of parallel algorithm and performance measurement is the major issue on multi-core environment. If
one wishes to execute a single application faster, then the application must be divided into subtask or
threads to deliver desired result. Numerical problems, especially the solution of linear system of equation
have many applications in science and engineering. This paper describes and analyzes the parallel
algorithms for computing the solution of dense system of linear equations, and to approximately compute
the value of π using OpenMP interface. The performances (speedup) of parallel algorithms on multi-core
system have been presented. The experimental results on a multi-core processor show that the proposed
parallel algorithms achieves good performance (speedup) compared to the sequential.
Keyword
Multi-core Architecture, OpenMP, Parallel algorithms, Performance analysis.
1. Introduction
To see the parallelism in the developed applications, a number of tools available for them.
Multithreading is the technique which is allow to execute multiple threads in parallel. The
computer architecture has been classified into two categories: instruction level and data level.
This classification is given by Flynn’s taxonomy [1]. Performance of parallel application can be
achieved using Multi-core technology.
Multi-core technology means having more than one core inside a single chip. This opens a way to
the parallel computation, where multiple parts of a program are executed in parallel at same time
[2]. The factor motivated the design of parallel algorithm for multi-core system is the
performance. The performance of parallel algorithm is sensitive to number of cores available in
the system, core to core latencies, memory hierarchy design, and synchronization costs. The
software development tools must abstract these variations so that software performance continues
to obtain the benefits of the Moore’s law.
In multi-core environment the sequential computing paradigm is not good and inefficient, while
the usual parallel computing may be suitable. One of the most important numerical problems is
solution of system of linear equations. Systems of linear equations arise in the science domain
International Journal of Computer Science, Engineering and Information Technology (IJCSEIT), Vol.2, No.5, October 2012
56
such as fusion energy, structural engineering and method of moment formulation of Maxwell
equation.
2. Related work
With the invention of multi-core technology, the parallel algorithm can get the benefits to
improve the performance of the application. Multi-core technologies supports multithreading to
executing multiple threads in parallel and hence the performance of the applications can be
improved. We studied the number of algorithms on multi-core/parallel machines and the
performance metrics of numerical algorithms [3], [4].The research on hybrid multi-core and GPU
architectures are also emerging trends in the multi-core era [5].
In order to achieve the high performance in the application, we need to develop the correct
parallel algorithm, requires the hardware and the language like OpenMP. The OpenMP has the
support of multithreading. The program can be developed so that all the processor can be busy to
improve the performance. So far all the works discuss about the performance of the algorithms.
Our unique approach is to find the performance of some numerical algorithms on Multi-core
system using OpenMP programming techniques.
3. Overview of Proposed Work
There are some numerical problems which are large and complex; solutions of which takes more
time using sequential algorithm on a single processor machine or on multiprocessor machine. The
fast solution of these problems can be obtained using parallel algorithms and multi-core system.
In this paper we select two numerical problems. The first problem is to approximately compute
the value of π using method of integration, and second is solution of system of linear equations
[6]. We describe the techniques and algorithm involved in achieving good performance by
reducing execution time through OpenMP Parallelization on multi-core. We tested the algorithms
by writing the program using OpenMP on multi-core system and measure their performances with
their execution times.
In our proposed work, we estimate the execution time taken by the programs of sequential and
parallel algorithms and also computed the speedup. The schematic diagram of our proposed work
for the solution of system of linear equation is shown in Fig.1.
Figure 1 Modules of parallel Algorithm
No. of linear equations (N)
Sequential Execution Parallel Execution
Calculate Execution Time Calculate Execution Time
Compare and Analyze Results
International Journal of Computer Science, Engineering and Information Technology (IJCSEIT), Vol.2, No.5, October 2012
57
4. Programming in OpenMP
An OpenMP Application Programming Interface (API) was developed to enable shared memory
parallel programming. OpenMP API is the set of compiler directives, library routines, and
environment variables to specify shared-memory parallelism in FORTRAN and C/C++ programs
[7]. It provides three kinds of directives: parallel work sharing, data environment and
synchronization to exploit the multi-core, multithreaded processors. The OpenMP provides means
for the programmer to: create teams of thread for parallel execution, specify how to share work
among the member of team, declare both shared and private variables, and synchronize threads
and enable them to perform certain operations exclusively [7].
OpenMP is based on the fork-and-join execution model, where a program is initialized as a single
thread named master thread [7]. This thread is executed sequentially until the first parallel
construct is encountered. This construct defines a parallel section (a block which can be executed
by a number of threads in parallel). The master thread creates a team of threads that executes the
statements concurrently in parallel contained in the parallel section. There is an implicit
synchronization at the end of the parallel region, after which only the master thread continues its
execution [7].
3.1 Creating an OpenMP Program
OpenMP’s directives can be used in the program which tells the compiler which instructions to
execute in parallel and how to distribute them among the threads [7]. The first step in creating
parallel program using OpenMP from a sequential one is to identify the parallelism it contains.
This requires finding instructions, sequences of instructions, or even large section of code that can
be executed concurrently by different processors. This is the important task when one goes to
develop the parallel application.
The second step in creating an OpenMP program is to express, using OpenMP, the parallelism
that has been identified [7]. A huge practical benefit of OpenMP is that it can be applied to
incrementally create a parallel program from an existing sequential code. The developer can
insert directives into a portion of the program and leave the rest in its sequential form. Once the
resulting program version has been successfully compiled and tested, another portion of the code
can be parallelized. The programmer can terminate this process once the desired speedup has
been obtained [7].
4. Performance of Parallel Algorithm
The amount of performance benefit an application will realize by using OpenMP depends entirely
on the extent to which it can be parallelized. Amdahl’s law specifies the maximum speed-up that
can be expected by parallelizing portions of a serial program [8]. Essentially, it states that the
maximum speed up (S) of a program is
S = 1/ (1-F) + (F / N)
where, F is the fraction of the total serial execution time taken by the portion of code that can be
parallelized and N is the number of processors over which the parallel portion of the code runs.
The metric that have been used to evaluate the performance of the parallel algorithm is the
speedup [8]. It is defined as
Sp = T1 / TP
International Journal of Computer Science, Engineering and Information Technology (IJCSEIT), Vol.2, No.5, October 2012
Where, T1 denotes the execution time of the best known sequential algorithm on single processor
machine, and Tp is the execution time of parallel algorithm on
In other word, speedup refers to how much the parallel algorithm is faster than the c
sequential algorithm. The linear or ideal speedup is obtained when
Proposed Methodology
5.1 Calculation of π
The function can be used to approximate the value of
integration. Consider the evaluation of
where, f(x) is called the integrand,
Integration of function f(x) can be evaluated numerically by splitting interval
spaced subintervals. We assume that the intervals are constant and its interval width is
Method of integration (Simpson 1/3 rule) have been used to approximately compute the values of
π. The formula and its sequential
Sequential Algorithm
1. Input N // N no. of intervals
2. h = (b-a) / N
3. sum= f(a) + f(b) + 4*f(a+h)
4. i = 3 to N-1 step +2 do
5. x = 2*f (a+ (i-1) *h + 4 * f (a+ i*h)
6. sum = sum+ x
7. End loop [i]
8. Int = (h/3) * sum
9. Print ‘Integral is = ’, Int
5.2 Solution of System of Linear equations
Solution of system of linear equations is assignment of value to variables that satisfy the
equations. To solve the system of linear equations, we considered the direct method:
elimination. It is a numerical method for solving the system of linear
is a known matrix of size n×n, X
n. Consider the n linear equation in n unknowns as
a11
a21
a31
International Journal of Computer Science, Engineering and Information Technology (IJCSEIT), Vol.2, No.5, October 2012
denotes the execution time of the best known sequential algorithm on single processor
is the execution time of parallel algorithm on P processor machine.
refers to how much the parallel algorithm is faster than the c
sequential algorithm. The linear or ideal speedup is obtained when Sp=P.
can be used to approximate the value of π using
Consider the evaluation of definite integral
is called the integrand, a lower, and b is upper limits of integration.
can be evaluated numerically by splitting interval [a, b] into
spaced subintervals. We assume that the intervals are constant and its interval width is
Method of integration (Simpson 1/3 rule) have been used to approximately compute the values of
sequential algorithm are given below.
N no. of intervals, a=0 and b=1 are limits
sum= f(a) + f(b) + 4*f(a+h)
1) *h + 4 * f (a+ i*h)
.2 Solution of System of Linear equations
Solution of system of linear equations is assignment of value to variables that satisfy the
equations. To solve the system of linear equations, we considered the direct method:
. It is a numerical method for solving the system of linear equations AX = B
X is the required solution vector, and B is a known vector of size
Consider the n linear equation in n unknowns as
1x1 + a12x2 + … + a1nxn = a1,n+1
1x1 + a22x2 + … + a2nxn = a2,n+1
1x1 + a32x2 + … + a3nxn = a3,n+1
International Journal of Computer Science, Engineering and Information Technology (IJCSEIT), Vol.2, No.5, October 2012
58
denotes the execution time of the best known sequential algorithm on single processor
processor machine.
refers to how much the parallel algorithm is faster than the corresponding
using numerical
[a, b] into n equally
spaced subintervals. We assume that the intervals are constant and its interval width is h= (b-a)/n.
Method of integration (Simpson 1/3 rule) have been used to approximately compute the values of
Solution of system of linear equations is assignment of value to variables that satisfy the
equations. To solve the system of linear equations, we considered the direct method: Gaussian
AX = B, where A
is a known vector of size
International Journal of Computer Science, Engineering and Information Technology (IJCSEIT), Vol.2, No.5, October 2012
59
… … … … … … … …
an1x1 + an2x2 + … + annxn = an,n+1, (1)
where, ai,j and ai,j+1 are known constants and xi’s are unknowns.
The parallel algorithm, which is used to solve dense system of linear equations using Gaussian
elimination with partial pivoting, is developed. Gauss method is based on transformation of linear
equations, which do not change the solution [9] [10][11]. It includes the transformations:
Multiplication of any equation by a nonzero constant, Permutation of equations, Addition of any
system of equation to other equation. This algorithm consists of two phases:
1. In the first phase the Pivot element is identified as the largest absolute value among the
coefficients in the first column. Then Exchange the first row with the row containing that
element. Then eliminate the first variable in the second equation using transformation.
When the second row becomes the pivot row, search for the coefficients in the second
column from the second row to the nth
row and locate the largest coefficient. Exchange
the second row with the row containing the largest coefficient. Continue this procedure
till (n-1) unknowns are eliminated [11].
2. The second phase in known as backward substitution phase which is concerned with the
actual solution of the equations and uses the back substitution process on the reduced
upper triangular system [11].
The sequential algorithm of gauss elimination is given below:
Sequential algorithm
1. Input: Given Matrix a [1: n, 1: n+1]
2. Output: x [1: n]
3. // Traingularization process
4. for k = 1 to n-1
5. Find pivot row by searching the row with greatest absolute value among the elements of
column k
6. swap the pivot row with row k
7. for i = k+1 to n
8. mi,k = ai,k / ak,k
9. for j = k to n+1
10. ai,j = ai,j – mi,k * ak,j
11. End loop [j]
12. End loop [i]
13. End loop [k]
14. // Back substitution process
15. xn = an,n+1 / an,n
16. for i = n-1 to 1 step -1 do
17. sum = 0
18. for j = i+1 to n do
19. sum = sum + ai,j * xj
20. End loop [j]
21. xi = ( ai,n+1 - sum )/ai,i
22. End loop[i]
International Journal of Computer Science, Engineering and Information Technology (IJCSEIT), Vol.2, No.5, October 2012
60
5. Design and Implementation
First we study the typical behavior of sequential algorithms and identified the section of operation
that can be executed in parallel. In designed parallel algorithm we have used the #pragma
directive which shows that the iteration of loop will execute in parallel on different processors.
The parallel algorithms for both the problem is given below.
We computed the value of Pi using numerical integration. There are six level of parallelization of
numerical integration. We have used the most efficient integration formula level to compute the
value of definite integral. In the parallel algorithm we inserted the #pragma directive to parallelize
the loop.
Parallel Algorithm- Computation of Pi Value
1. Input N // N no. of intervals, a=0 and b=1 are limits
2. h = (b-a) / N
3. Start the clock
4. sum= f(a) + f(b) + 4*f(a+h)
5. Insert #pragma directive to parallelize the loop i
6. i = 3 to N-1 step +2 do
7. x = 2*f (a+ (i-1) *h + 4 * f (a+ i*h)
8. sum = sum+ x
9. End loop [i]
10. Int = (h/3) * sum
11. Stop clock
12. Display time taken and vale of Int
In the sequential algorithm of Gauss elimination we found that the innermost loops indexed by i,
and j can be executed in parallel without affecting the result. In the parallel algorithm, we insert
the #pragma directive to parallelize the loops.
Parallel algorithm- Gauss elimination
1. Input: a [1: n, 1: n+1] // Read the matrix data
2. Output: x [1: n]
3. Set the numbr of threads
4. Strat clock
5. // Traingularization process
6. for k = 1 to n-1
7. Insert #pragma directive to parallelize
8. for i = k+1 to n
9. mi,k = ai,k / ak,k
10. for j = k to n+1
11. ai,j = ai,j – mi,k * ak,j
12. End loop [j]
13. End loop [i]
14. End loop [k]
15. // Back substitution process
16. xn = an,n+1 / an,n
17. for i = n-1 to 1 step -1 do
18. sum = 0
International Journal of Computer Science, Engineering and Information Technology (IJCSEIT), Vol.2, No.5, October 2012
61
19. for j = i+1 to n do
20. sum = sum + ai,j * xj
21. End loop [j]
22. xi = ( ai,n+1 - sum )/ai,i
23. End loop[i]
24. Stop clock
25. Display time taken and solution vector
6. Experimental Results
There are two version of algorithm: sequential and parallel. The programs are executed on
Intel@Core2-Duo processor machine. We analyzed the performance using results and finally
derived the conclusions. The Intel C++ compiler 10.0 under Microsoft Visual Studio 8.0 used for
compilations and executions. The Intel C++ compiler supports multithreaded parallelism with
/Qopenmp flag. The Origin6.1 software is used to plot the graph using the data obtained by the
experiments.
In the first experiment the execution times of both the sequential and parallel algorithms have
been recorded to measure the performance (speedup) of parallel algorithm against sequential. We
used π value from Mathematica to compare the accuracy of computed value. Mathematica is
known for its capability to do computations with arbitrary precision.
The data presented in Table 1 represents the execution time taken by the sequential and parallel
programs, difference between mathematica’s values of π and the speedup. We plot the graph
using the data in Table 1 to analyze the performance of parallel algorithm which is shown in fig.
2. The result obtained shows a vast difference in time required to execute the parallel algorithm
and time taken by sequential algorithm. The parallel algorithm is approximately twice faster than
the sequential.
Table 1 Performance comparison of sequential and parallel algorithm to compute the value of π
Sr. No. No. of
Intervals
Sequential Execution Time &
Difference between results
Parallel Execution Time &
Difference between results
Performance/
Speed up (S)
Time (m. s.) Difference Time (m. s.) Difference
1 1000 0 -1.67*10-7
0 -1.67*10-7
0
2 10000 0 -1.67*10-9
0 -1.67* 10-9
0
3 100000 15 -1.67*10-11
8 -1.67*10-11
1.875
4 1000000 78 -4.44*10-16
40 -2.04*10-16
1.950
International Journal of Computer Science, Engineering and Information Technology (IJCSEIT), Vol.2, No.5, October 2012
62
0 1 x 1 0
5
2 x 1 0
5
3 x 1 0
5
4 x 1 0
5
5 x 1 0
5
6 x 1 0
5
7 x 1 0
5
8 x 1 0
5
9 x 1 0
5
1 x 1 0
6
- 5
0
5
1 0
1 5
2 0
2 5
3 0
3 5
4 0
4 5
5 0
5 5
6 0
6 5
7 0
7 5
8 0
8 5
9 0
ExecutionTime(inMilliseconds)
N u m b r e o f I n t e r v a l s
S e q u e n t ia l A l g o r it h m
P a r a lle l A lg o r it h m
Figure 2 Execution times of sequential and parallel computation of π
In the second experiment, we implemented the sequential and parallel algorithms for finding the
solution of system of linear equations. We tested both the algorithms on the equations of different
sizes and recorded their execution times. In the program we used the function timeGetTime() to
calculate the time taken by sequential and parallel algorithms.
The data in Table 2 represent the execution times taken by sequential and parallel programs for
the solution of system of linear equations of different sizes, and the speedup. The result shows
that the parallel algorithm is efficient than their corresponding sequential algorithm. We plot the
graph using data in Table 2 to analyze the performance (speedup) of parallel algorithm which is
presented in Fig.3. It shows that the parallel algorithm save significant amounts of execution time
and gives more efficient results. The speedup of parallel algorithm on average is approximately
twice than their corresponding sequential algorithm.
Table 2 Performance comparison of sequential and parallel algorithms
No. of Equations Sequential Execution Time
(m. s.)
Parallel Execution Time
(m. s.)
Performance / Speedup(S)
100 0 0 0
125 31.5 16.50 1.90
150 46.5 24.25 1.91
175 70.5 36.50 1.93
200 124.5 63.50 1.96
225 183.25 92.25 1.98
250 249.5 125.25 1.990
275 327.5 165.50 1.97
300 379.25 190.00 1.996
International Journal of Computer Science, Engineering and Information Technology (IJCSEIT), Vol.2, No.5, October 2012
63
Figure 3. Execution time of sequential and parallel algorithm of Gauss elimination
algorithms
7. Conclusions and Future Enhancement
In this work we studied how OpenMP programming techniques are beneficial to multi-core
system. We also computed the value of Pi and solved linear equations using OpenMP to improve
performance by reducing execution time. We also presented the execution time of both serial and
parallel algorithm for computation of Pi value and for the solution of system of linear equations.
The work has successfully computed the value of Pi and solution of system of linear equation
using OpenMP on multi-core machine.
Based on our study we arrive at the following conclusions: (1) we see that parallelizing serial
algorithm using OpenMP has increased the performance. (2) For multi-core system OpenMP
provides a lot of performance increase and parallelization can be done with careful small changes.
(3) The parallel algorithm is approximately twice faster than the sequential and the speedup is
linear.
The future enhancement of this work is highly creditable as parallelization using OpenMP is
gaining popularity. This work will be carried out in near future for the real time implementation
over a large extent and for the high performance systems.
References
[1] Noronha R., Panda D.K., “Improving Scalability of OpenMP Applications on Muti-core Systems
Using Large Page Support, IEEE Computer, 2007.
[2] Kulkarni, S. G., “Analysis of Multi-Core System Performance through OpenMP”, National
Conference on Advanced Computing and Communication Technology, IJTES, Vol-1. No.-2, Page.
189-192, July – Sep 2010.
[3] Gallivan K. A., Plemmons R. J., and Sameh A. H.,"Parallel algorithms for dense linear algebra
computations," SIAM Rev., vol. 32, pp. 54-135, March 1990.
[4] Gallivan K. A., Jalby W., Malony A. D., and Wijshoff H. A. G., "Performance prediction for parallel
numerical algorithms.," International Journal of High Speed Computing, vol. 3, no. 1, pp. 31-62,
1991.
0 2 5 5 0 7 5 1 0 0 1 2 5 1 5 0 1 7 5 2 0 0 2 2 5 2 5 0 2 7 5 3 0 0 3 2 5 3 5 0
0 . 0 0
2 5 . 0 0
5 0 . 0 0
7 5 . 0 0
1 0 0 . 0 0
1 2 5 . 0 0
1 5 0 . 0 0
1 7 5 . 0 0
2 0 0 . 0 0
2 2 5 . 0 0
2 5 0 . 0 0
2 7 5 . 0 0
3 0 0 . 0 0
3 2 5 . 0 0
3 5 0 . 0 0
3 7 5 . 0 0
4 0 0 . 0 0
4 2 5 . 0 0
ExecutionTime(inMilliseconds)
N u m b e r o f E q u a t io n s
S e q u e n t ia l A lg o r it h m
P a r a lle l A lg o r it h m
International Journal of Computer Science, Engineering and Information Technology (IJCSEIT), Vol.2, No.5, October 2012
64
[5] Horton M., Tomov S., and Dongarra J., "A class of hybrid LAPACK algorithms for multi-core and
GPU architectures," in Proceedings of the 2011 Symposium on Application Accelerators in High-
Performance Computing, SAAHPC ’11, (Washington, DC, USA), pp. 150-158, IEEE Computer
Society, 2011.
[6] Wilkinson, B., Allen, M., “Parallel Programming”, Pearson Education (Singapore), 2002.
[7] Barbara, C., Jost, G., Pas, R.V., “Using OpenMP: portable shared memory parallel programming”,
The MIT Press, Cambridge, Massachusetts, London, 2008.
[8] Quinn, M. J, “Parallel Programming in C with MPI and OpenMP”, McGraw-Hill Higher Education,
2004.
[9] Angadi, Suneeta H., Raju, G. T. & Abhishek, B., Software Performance Analysis with Parallel
Programming Approaches , International Journal of Computer Science and Informatics ISSN
(PRINT): 2231 –5292, Vol-1, Iss-4, 2012.
[10] Bertsekas, Dimitri P., Tsitsiklis, John N.,”Parallel and Distributed Computation: Numerical
Methods”, Massanchusetts Institute of Technology, Prentice-Hall, Inc., in 1989
[11] Vijayalakshmi, S., Mohan, R., Mukund, S., Kothari, D.P. ,” LINPACK: Power-Performance Analysis
of Multi-Core Processors Using OpenMP, International Journal of Computer Applciation, Vol. 42-
No. 1, April 2012.

More Related Content

What's hot

Software effort estimation through clustering techniques of RBFN network
Software effort estimation through clustering techniques of RBFN networkSoftware effort estimation through clustering techniques of RBFN network
Software effort estimation through clustering techniques of RBFN networkIOSR Journals
 
Bounded ant colony algorithm for task Allocation on a network of homogeneous ...
Bounded ant colony algorithm for task Allocation on a network of homogeneous ...Bounded ant colony algorithm for task Allocation on a network of homogeneous ...
Bounded ant colony algorithm for task Allocation on a network of homogeneous ...ijcsit
 
GRAPH MATCHING ALGORITHM FOR TASK ASSIGNMENT PROBLEM
GRAPH MATCHING ALGORITHM FOR TASK ASSIGNMENT PROBLEMGRAPH MATCHING ALGORITHM FOR TASK ASSIGNMENT PROBLEM
GRAPH MATCHING ALGORITHM FOR TASK ASSIGNMENT PROBLEMIJCSEA Journal
 
11.dynamic instruction scheduling for microprocessors having out of order exe...
11.dynamic instruction scheduling for microprocessors having out of order exe...11.dynamic instruction scheduling for microprocessors having out of order exe...
11.dynamic instruction scheduling for microprocessors having out of order exe...Alexander Decker
 
Parallel programming model
Parallel programming modelParallel programming model
Parallel programming modeleasy notes
 
Parallel algorithms
Parallel algorithmsParallel algorithms
Parallel algorithmsDanish Javed
 
Reinforcement learning based multi core scheduling (RLBMCS) for real time sys...
Reinforcement learning based multi core scheduling (RLBMCS) for real time sys...Reinforcement learning based multi core scheduling (RLBMCS) for real time sys...
Reinforcement learning based multi core scheduling (RLBMCS) for real time sys...IJECEIAES
 
A SYSTEMC/SIMULINK CO-SIMULATION ENVIRONMENT OF THE JPEG ALGORITHM
A SYSTEMC/SIMULINK CO-SIMULATION ENVIRONMENT OF THE JPEG ALGORITHMA SYSTEMC/SIMULINK CO-SIMULATION ENVIRONMENT OF THE JPEG ALGORITHM
A SYSTEMC/SIMULINK CO-SIMULATION ENVIRONMENT OF THE JPEG ALGORITHMVLSICS Design
 
11.[10 14]dynamic instruction scheduling for microprocessors having out of or...
11.[10 14]dynamic instruction scheduling for microprocessors having out of or...11.[10 14]dynamic instruction scheduling for microprocessors having out of or...
11.[10 14]dynamic instruction scheduling for microprocessors having out of or...Alexander Decker
 
VauhkonenVohraMadaan-ProjectDeepLearningBenchMarks
VauhkonenVohraMadaan-ProjectDeepLearningBenchMarksVauhkonenVohraMadaan-ProjectDeepLearningBenchMarks
VauhkonenVohraMadaan-ProjectDeepLearningBenchMarksMumtaz Hannah Vauhkonen
 
Downloadfile
DownloadfileDownloadfile
Downloadfilegaur07av
 
Evaluation of morden computer & system attributes in ACA
Evaluation of morden computer &  system attributes in ACAEvaluation of morden computer &  system attributes in ACA
Evaluation of morden computer & system attributes in ACAPankaj Kumar Jain
 
Advanced computer architecture unit 5
Advanced computer architecture  unit 5Advanced computer architecture  unit 5
Advanced computer architecture unit 5Kunal Bangar
 
advanced computer architesture-conditions of parallelism
advanced computer architesture-conditions of parallelismadvanced computer architesture-conditions of parallelism
advanced computer architesture-conditions of parallelismPankaj Kumar Jain
 
program partitioning and scheduling IN Advanced Computer Architecture
program partitioning and scheduling  IN Advanced Computer Architectureprogram partitioning and scheduling  IN Advanced Computer Architecture
program partitioning and scheduling IN Advanced Computer ArchitecturePankaj Kumar Jain
 
Everything You Need to Know About the Intel® MPI Library
Everything You Need to Know About the Intel® MPI LibraryEverything You Need to Know About the Intel® MPI Library
Everything You Need to Know About the Intel® MPI LibraryIntel® Software
 

What's hot (20)

Software effort estimation through clustering techniques of RBFN network
Software effort estimation through clustering techniques of RBFN networkSoftware effort estimation through clustering techniques of RBFN network
Software effort estimation through clustering techniques of RBFN network
 
Bounded ant colony algorithm for task Allocation on a network of homogeneous ...
Bounded ant colony algorithm for task Allocation on a network of homogeneous ...Bounded ant colony algorithm for task Allocation on a network of homogeneous ...
Bounded ant colony algorithm for task Allocation on a network of homogeneous ...
 
GRAPH MATCHING ALGORITHM FOR TASK ASSIGNMENT PROBLEM
GRAPH MATCHING ALGORITHM FOR TASK ASSIGNMENT PROBLEMGRAPH MATCHING ALGORITHM FOR TASK ASSIGNMENT PROBLEM
GRAPH MATCHING ALGORITHM FOR TASK ASSIGNMENT PROBLEM
 
11.dynamic instruction scheduling for microprocessors having out of order exe...
11.dynamic instruction scheduling for microprocessors having out of order exe...11.dynamic instruction scheduling for microprocessors having out of order exe...
11.dynamic instruction scheduling for microprocessors having out of order exe...
 
Parallel programming model
Parallel programming modelParallel programming model
Parallel programming model
 
Parallel algorithms
Parallel algorithmsParallel algorithms
Parallel algorithms
 
Reinforcement learning based multi core scheduling (RLBMCS) for real time sys...
Reinforcement learning based multi core scheduling (RLBMCS) for real time sys...Reinforcement learning based multi core scheduling (RLBMCS) for real time sys...
Reinforcement learning based multi core scheduling (RLBMCS) for real time sys...
 
A SYSTEMC/SIMULINK CO-SIMULATION ENVIRONMENT OF THE JPEG ALGORITHM
A SYSTEMC/SIMULINK CO-SIMULATION ENVIRONMENT OF THE JPEG ALGORITHMA SYSTEMC/SIMULINK CO-SIMULATION ENVIRONMENT OF THE JPEG ALGORITHM
A SYSTEMC/SIMULINK CO-SIMULATION ENVIRONMENT OF THE JPEG ALGORITHM
 
11.[10 14]dynamic instruction scheduling for microprocessors having out of or...
11.[10 14]dynamic instruction scheduling for microprocessors having out of or...11.[10 14]dynamic instruction scheduling for microprocessors having out of or...
11.[10 14]dynamic instruction scheduling for microprocessors having out of or...
 
60
6060
60
 
VauhkonenVohraMadaan-ProjectDeepLearningBenchMarks
VauhkonenVohraMadaan-ProjectDeepLearningBenchMarksVauhkonenVohraMadaan-ProjectDeepLearningBenchMarks
VauhkonenVohraMadaan-ProjectDeepLearningBenchMarks
 
10 3
10 310 3
10 3
 
Downloadfile
DownloadfileDownloadfile
Downloadfile
 
Parallel Algorithms
Parallel AlgorithmsParallel Algorithms
Parallel Algorithms
 
Systolic, Transposed & Semi-Parallel Architectures and Programming
Systolic, Transposed & Semi-Parallel Architectures and ProgrammingSystolic, Transposed & Semi-Parallel Architectures and Programming
Systolic, Transposed & Semi-Parallel Architectures and Programming
 
Evaluation of morden computer & system attributes in ACA
Evaluation of morden computer &  system attributes in ACAEvaluation of morden computer &  system attributes in ACA
Evaluation of morden computer & system attributes in ACA
 
Advanced computer architecture unit 5
Advanced computer architecture  unit 5Advanced computer architecture  unit 5
Advanced computer architecture unit 5
 
advanced computer architesture-conditions of parallelism
advanced computer architesture-conditions of parallelismadvanced computer architesture-conditions of parallelism
advanced computer architesture-conditions of parallelism
 
program partitioning and scheduling IN Advanced Computer Architecture
program partitioning and scheduling  IN Advanced Computer Architectureprogram partitioning and scheduling  IN Advanced Computer Architecture
program partitioning and scheduling IN Advanced Computer Architecture
 
Everything You Need to Know About the Intel® MPI Library
Everything You Need to Know About the Intel® MPI LibraryEverything You Need to Know About the Intel® MPI Library
Everything You Need to Know About the Intel® MPI Library
 

Similar to Performance Analysis of Parallel Algorithms on Multi-core System using OpenMP

IRJET- Latin Square Computation of Order-3 using Open CL
IRJET- Latin Square Computation of Order-3 using Open CLIRJET- Latin Square Computation of Order-3 using Open CL
IRJET- Latin Square Computation of Order-3 using Open CLIRJET Journal
 
Parallelization of Graceful Labeling Using Open MP
Parallelization of Graceful Labeling Using Open MPParallelization of Graceful Labeling Using Open MP
Parallelization of Graceful Labeling Using Open MPIJSRED
 
Towards high performance computing(hpc) through parallel programming paradigm...
Towards high performance computing(hpc) through parallel programming paradigm...Towards high performance computing(hpc) through parallel programming paradigm...
Towards high performance computing(hpc) through parallel programming paradigm...ijpla
 
An Adjacent Analysis of the Parallel Programming Model Perspective: A Survey
 An Adjacent Analysis of the Parallel Programming Model Perspective: A Survey An Adjacent Analysis of the Parallel Programming Model Perspective: A Survey
An Adjacent Analysis of the Parallel Programming Model Perspective: A SurveyIRJET Journal
 
Parallelization of Coupled Cluster Code with OpenMP
Parallelization of Coupled Cluster Code with OpenMPParallelization of Coupled Cluster Code with OpenMP
Parallelization of Coupled Cluster Code with OpenMPAnil Bohare
 
LOCK-FREE PARALLEL ACCESS COLLECTIONS
LOCK-FREE PARALLEL ACCESS COLLECTIONSLOCK-FREE PARALLEL ACCESS COLLECTIONS
LOCK-FREE PARALLEL ACCESS COLLECTIONSijdpsjournal
 
Lock free parallel access collections
Lock free parallel access collectionsLock free parallel access collections
Lock free parallel access collectionsijdpsjournal
 
Performance evaluation of larger matrices over cluster of four nodes using mpi
Performance evaluation of larger matrices over cluster of four nodes using mpiPerformance evaluation of larger matrices over cluster of four nodes using mpi
Performance evaluation of larger matrices over cluster of four nodes using mpieSAT Journals
 
International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)inventionjournals
 
Scaling Application on High Performance Computing Clusters and Analysis of th...
Scaling Application on High Performance Computing Clusters and Analysis of th...Scaling Application on High Performance Computing Clusters and Analysis of th...
Scaling Application on High Performance Computing Clusters and Analysis of th...Rusif Eyvazli
 
Debugging and optimization of multi-thread OpenMP-programs
Debugging and optimization of multi-thread OpenMP-programsDebugging and optimization of multi-thread OpenMP-programs
Debugging and optimization of multi-thread OpenMP-programsPVS-Studio
 
Introduction to data structures and Algorithm
Introduction to data structures and AlgorithmIntroduction to data structures and Algorithm
Introduction to data structures and AlgorithmDhaval Kaneria
 
Cupdf.com introduction to-data-structures-and-algorithm
Cupdf.com introduction to-data-structures-and-algorithmCupdf.com introduction to-data-structures-and-algorithm
Cupdf.com introduction to-data-structures-and-algorithmTarikuDabala1
 
Introduction to data structures and Algorithm
Introduction to data structures and AlgorithmIntroduction to data structures and Algorithm
Introduction to data structures and AlgorithmDhaval Kaneria
 
SPEED-UP IMPROVEMENT USING PARALLEL APPROACH IN IMAGE STEGANOGRAPHY
SPEED-UP IMPROVEMENT USING PARALLEL APPROACH IN IMAGE STEGANOGRAPHY SPEED-UP IMPROVEMENT USING PARALLEL APPROACH IN IMAGE STEGANOGRAPHY
SPEED-UP IMPROVEMENT USING PARALLEL APPROACH IN IMAGE STEGANOGRAPHY cscpconf
 

Similar to Performance Analysis of Parallel Algorithms on Multi-core System using OpenMP (20)

IRJET- Latin Square Computation of Order-3 using Open CL
IRJET- Latin Square Computation of Order-3 using Open CLIRJET- Latin Square Computation of Order-3 using Open CL
IRJET- Latin Square Computation of Order-3 using Open CL
 
Parallelization of Graceful Labeling Using Open MP
Parallelization of Graceful Labeling Using Open MPParallelization of Graceful Labeling Using Open MP
Parallelization of Graceful Labeling Using Open MP
 
Towards high performance computing(hpc) through parallel programming paradigm...
Towards high performance computing(hpc) through parallel programming paradigm...Towards high performance computing(hpc) through parallel programming paradigm...
Towards high performance computing(hpc) through parallel programming paradigm...
 
An Adjacent Analysis of the Parallel Programming Model Perspective: A Survey
 An Adjacent Analysis of the Parallel Programming Model Perspective: A Survey An Adjacent Analysis of the Parallel Programming Model Perspective: A Survey
An Adjacent Analysis of the Parallel Programming Model Perspective: A Survey
 
1844 1849
1844 18491844 1849
1844 1849
 
1844 1849
1844 18491844 1849
1844 1849
 
1.prallelism
1.prallelism1.prallelism
1.prallelism
 
1.prallelism
1.prallelism1.prallelism
1.prallelism
 
Parallelization of Coupled Cluster Code with OpenMP
Parallelization of Coupled Cluster Code with OpenMPParallelization of Coupled Cluster Code with OpenMP
Parallelization of Coupled Cluster Code with OpenMP
 
LOCK-FREE PARALLEL ACCESS COLLECTIONS
LOCK-FREE PARALLEL ACCESS COLLECTIONSLOCK-FREE PARALLEL ACCESS COLLECTIONS
LOCK-FREE PARALLEL ACCESS COLLECTIONS
 
Lock free parallel access collections
Lock free parallel access collectionsLock free parallel access collections
Lock free parallel access collections
 
Aq4301224227
Aq4301224227Aq4301224227
Aq4301224227
 
Performance evaluation of larger matrices over cluster of four nodes using mpi
Performance evaluation of larger matrices over cluster of four nodes using mpiPerformance evaluation of larger matrices over cluster of four nodes using mpi
Performance evaluation of larger matrices over cluster of four nodes using mpi
 
International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)
 
Scaling Application on High Performance Computing Clusters and Analysis of th...
Scaling Application on High Performance Computing Clusters and Analysis of th...Scaling Application on High Performance Computing Clusters and Analysis of th...
Scaling Application on High Performance Computing Clusters and Analysis of th...
 
Debugging and optimization of multi-thread OpenMP-programs
Debugging and optimization of multi-thread OpenMP-programsDebugging and optimization of multi-thread OpenMP-programs
Debugging and optimization of multi-thread OpenMP-programs
 
Introduction to data structures and Algorithm
Introduction to data structures and AlgorithmIntroduction to data structures and Algorithm
Introduction to data structures and Algorithm
 
Cupdf.com introduction to-data-structures-and-algorithm
Cupdf.com introduction to-data-structures-and-algorithmCupdf.com introduction to-data-structures-and-algorithm
Cupdf.com introduction to-data-structures-and-algorithm
 
Introduction to data structures and Algorithm
Introduction to data structures and AlgorithmIntroduction to data structures and Algorithm
Introduction to data structures and Algorithm
 
SPEED-UP IMPROVEMENT USING PARALLEL APPROACH IN IMAGE STEGANOGRAPHY
SPEED-UP IMPROVEMENT USING PARALLEL APPROACH IN IMAGE STEGANOGRAPHY SPEED-UP IMPROVEMENT USING PARALLEL APPROACH IN IMAGE STEGANOGRAPHY
SPEED-UP IMPROVEMENT USING PARALLEL APPROACH IN IMAGE STEGANOGRAPHY
 

More from IJCSEIT Journal

ANALYSIS OF EXISTING TRAILERS’ CONTAINER LOCK SYSTEMS
ANALYSIS OF EXISTING TRAILERS’ CONTAINER LOCK SYSTEMS ANALYSIS OF EXISTING TRAILERS’ CONTAINER LOCK SYSTEMS
ANALYSIS OF EXISTING TRAILERS’ CONTAINER LOCK SYSTEMS IJCSEIT Journal
 
A MODEL FOR REMOTE ACCESS AND PROTECTION OF SMARTPHONES USING SHORT MESSAGE S...
A MODEL FOR REMOTE ACCESS AND PROTECTION OF SMARTPHONES USING SHORT MESSAGE S...A MODEL FOR REMOTE ACCESS AND PROTECTION OF SMARTPHONES USING SHORT MESSAGE S...
A MODEL FOR REMOTE ACCESS AND PROTECTION OF SMARTPHONES USING SHORT MESSAGE S...IJCSEIT Journal
 
BIOMETRIC APPLICATION OF INTELLIGENT AGENTS IN FAKE DOCUMENT DETECTION OF JOB...
BIOMETRIC APPLICATION OF INTELLIGENT AGENTS IN FAKE DOCUMENT DETECTION OF JOB...BIOMETRIC APPLICATION OF INTELLIGENT AGENTS IN FAKE DOCUMENT DETECTION OF JOB...
BIOMETRIC APPLICATION OF INTELLIGENT AGENTS IN FAKE DOCUMENT DETECTION OF JOB...IJCSEIT Journal
 
FACE RECOGNITION USING DIFFERENT LOCAL FEATURES WITH DIFFERENT DISTANCE TECHN...
FACE RECOGNITION USING DIFFERENT LOCAL FEATURES WITH DIFFERENT DISTANCE TECHN...FACE RECOGNITION USING DIFFERENT LOCAL FEATURES WITH DIFFERENT DISTANCE TECHN...
FACE RECOGNITION USING DIFFERENT LOCAL FEATURES WITH DIFFERENT DISTANCE TECHN...IJCSEIT Journal
 
BIOMETRICS AUTHENTICATION TECHNIQUE FOR INTRUSION DETECTION SYSTEMS USING FIN...
BIOMETRICS AUTHENTICATION TECHNIQUE FOR INTRUSION DETECTION SYSTEMS USING FIN...BIOMETRICS AUTHENTICATION TECHNIQUE FOR INTRUSION DETECTION SYSTEMS USING FIN...
BIOMETRICS AUTHENTICATION TECHNIQUE FOR INTRUSION DETECTION SYSTEMS USING FIN...IJCSEIT Journal
 
PERFORMANCE ANALYSIS OF FINGERPRINTING EXTRACTION ALGORITHM IN VIDEO COPY DET...
PERFORMANCE ANALYSIS OF FINGERPRINTING EXTRACTION ALGORITHM IN VIDEO COPY DET...PERFORMANCE ANALYSIS OF FINGERPRINTING EXTRACTION ALGORITHM IN VIDEO COPY DET...
PERFORMANCE ANALYSIS OF FINGERPRINTING EXTRACTION ALGORITHM IN VIDEO COPY DET...IJCSEIT Journal
 
Effect of Interleaved FEC Code on Wavelet Based MC-CDMA System with Alamouti ...
Effect of Interleaved FEC Code on Wavelet Based MC-CDMA System with Alamouti ...Effect of Interleaved FEC Code on Wavelet Based MC-CDMA System with Alamouti ...
Effect of Interleaved FEC Code on Wavelet Based MC-CDMA System with Alamouti ...IJCSEIT Journal
 
FUZZY WEIGHTED ASSOCIATIVE CLASSIFIER: A PREDICTIVE TECHNIQUE FOR HEALTH CARE...
FUZZY WEIGHTED ASSOCIATIVE CLASSIFIER: A PREDICTIVE TECHNIQUE FOR HEALTH CARE...FUZZY WEIGHTED ASSOCIATIVE CLASSIFIER: A PREDICTIVE TECHNIQUE FOR HEALTH CARE...
FUZZY WEIGHTED ASSOCIATIVE CLASSIFIER: A PREDICTIVE TECHNIQUE FOR HEALTH CARE...IJCSEIT Journal
 
GENDER RECOGNITION SYSTEM USING SPEECH SIGNAL
GENDER RECOGNITION SYSTEM USING SPEECH SIGNALGENDER RECOGNITION SYSTEM USING SPEECH SIGNAL
GENDER RECOGNITION SYSTEM USING SPEECH SIGNALIJCSEIT Journal
 
DETECTION OF CONCEALED WEAPONS IN X-RAY IMAGES USING FUZZY K-NN
DETECTION OF CONCEALED WEAPONS IN X-RAY IMAGES USING FUZZY K-NNDETECTION OF CONCEALED WEAPONS IN X-RAY IMAGES USING FUZZY K-NN
DETECTION OF CONCEALED WEAPONS IN X-RAY IMAGES USING FUZZY K-NNIJCSEIT Journal
 
META-HEURISTICS BASED ARF OPTIMIZATION FOR IMAGE RETRIEVAL
META-HEURISTICS BASED ARF OPTIMIZATION FOR IMAGE RETRIEVALMETA-HEURISTICS BASED ARF OPTIMIZATION FOR IMAGE RETRIEVAL
META-HEURISTICS BASED ARF OPTIMIZATION FOR IMAGE RETRIEVALIJCSEIT Journal
 
ERROR PERFORMANCE ANALYSIS USING COOPERATIVE CONTENTION-BASED ROUTING IN WIRE...
ERROR PERFORMANCE ANALYSIS USING COOPERATIVE CONTENTION-BASED ROUTING IN WIRE...ERROR PERFORMANCE ANALYSIS USING COOPERATIVE CONTENTION-BASED ROUTING IN WIRE...
ERROR PERFORMANCE ANALYSIS USING COOPERATIVE CONTENTION-BASED ROUTING IN WIRE...IJCSEIT Journal
 
M-FISH KARYOTYPING - A NEW APPROACH BASED ON WATERSHED TRANSFORM
M-FISH KARYOTYPING - A NEW APPROACH BASED ON WATERSHED TRANSFORMM-FISH KARYOTYPING - A NEW APPROACH BASED ON WATERSHED TRANSFORM
M-FISH KARYOTYPING - A NEW APPROACH BASED ON WATERSHED TRANSFORMIJCSEIT Journal
 
RANDOMIZED STEGANOGRAPHY IN SKIN TONE IMAGES
RANDOMIZED STEGANOGRAPHY IN SKIN TONE IMAGESRANDOMIZED STEGANOGRAPHY IN SKIN TONE IMAGES
RANDOMIZED STEGANOGRAPHY IN SKIN TONE IMAGESIJCSEIT Journal
 
A NOVEL WINDOW FUNCTION YIELDING SUPPRESSED MAINLOBE WIDTH AND MINIMUM SIDELO...
A NOVEL WINDOW FUNCTION YIELDING SUPPRESSED MAINLOBE WIDTH AND MINIMUM SIDELO...A NOVEL WINDOW FUNCTION YIELDING SUPPRESSED MAINLOBE WIDTH AND MINIMUM SIDELO...
A NOVEL WINDOW FUNCTION YIELDING SUPPRESSED MAINLOBE WIDTH AND MINIMUM SIDELO...IJCSEIT Journal
 
CSHURI – Modified HURI algorithm for Customer Segmentation and Transaction Pr...
CSHURI – Modified HURI algorithm for Customer Segmentation and Transaction Pr...CSHURI – Modified HURI algorithm for Customer Segmentation and Transaction Pr...
CSHURI – Modified HURI algorithm for Customer Segmentation and Transaction Pr...IJCSEIT Journal
 
AN EFFICIENT IMPLEMENTATION OF TRACKING USING KALMAN FILTER FOR UNDERWATER RO...
AN EFFICIENT IMPLEMENTATION OF TRACKING USING KALMAN FILTER FOR UNDERWATER RO...AN EFFICIENT IMPLEMENTATION OF TRACKING USING KALMAN FILTER FOR UNDERWATER RO...
AN EFFICIENT IMPLEMENTATION OF TRACKING USING KALMAN FILTER FOR UNDERWATER RO...IJCSEIT Journal
 
USING DATA MINING TECHNIQUES FOR DIAGNOSIS AND PROGNOSIS OF CANCER DISEASE
USING DATA MINING TECHNIQUES FOR DIAGNOSIS AND PROGNOSIS OF CANCER DISEASEUSING DATA MINING TECHNIQUES FOR DIAGNOSIS AND PROGNOSIS OF CANCER DISEASE
USING DATA MINING TECHNIQUES FOR DIAGNOSIS AND PROGNOSIS OF CANCER DISEASEIJCSEIT Journal
 
FACTORS AFFECTING ACCEPTANCE OF WEB-BASED TRAINING SYSTEM: USING EXTENDED UTA...
FACTORS AFFECTING ACCEPTANCE OF WEB-BASED TRAINING SYSTEM: USING EXTENDED UTA...FACTORS AFFECTING ACCEPTANCE OF WEB-BASED TRAINING SYSTEM: USING EXTENDED UTA...
FACTORS AFFECTING ACCEPTANCE OF WEB-BASED TRAINING SYSTEM: USING EXTENDED UTA...IJCSEIT Journal
 
PROBABILISTIC INTERPRETATION OF COMPLEX FUZZY SET
PROBABILISTIC INTERPRETATION OF COMPLEX FUZZY SETPROBABILISTIC INTERPRETATION OF COMPLEX FUZZY SET
PROBABILISTIC INTERPRETATION OF COMPLEX FUZZY SETIJCSEIT Journal
 

More from IJCSEIT Journal (20)

ANALYSIS OF EXISTING TRAILERS’ CONTAINER LOCK SYSTEMS
ANALYSIS OF EXISTING TRAILERS’ CONTAINER LOCK SYSTEMS ANALYSIS OF EXISTING TRAILERS’ CONTAINER LOCK SYSTEMS
ANALYSIS OF EXISTING TRAILERS’ CONTAINER LOCK SYSTEMS
 
A MODEL FOR REMOTE ACCESS AND PROTECTION OF SMARTPHONES USING SHORT MESSAGE S...
A MODEL FOR REMOTE ACCESS AND PROTECTION OF SMARTPHONES USING SHORT MESSAGE S...A MODEL FOR REMOTE ACCESS AND PROTECTION OF SMARTPHONES USING SHORT MESSAGE S...
A MODEL FOR REMOTE ACCESS AND PROTECTION OF SMARTPHONES USING SHORT MESSAGE S...
 
BIOMETRIC APPLICATION OF INTELLIGENT AGENTS IN FAKE DOCUMENT DETECTION OF JOB...
BIOMETRIC APPLICATION OF INTELLIGENT AGENTS IN FAKE DOCUMENT DETECTION OF JOB...BIOMETRIC APPLICATION OF INTELLIGENT AGENTS IN FAKE DOCUMENT DETECTION OF JOB...
BIOMETRIC APPLICATION OF INTELLIGENT AGENTS IN FAKE DOCUMENT DETECTION OF JOB...
 
FACE RECOGNITION USING DIFFERENT LOCAL FEATURES WITH DIFFERENT DISTANCE TECHN...
FACE RECOGNITION USING DIFFERENT LOCAL FEATURES WITH DIFFERENT DISTANCE TECHN...FACE RECOGNITION USING DIFFERENT LOCAL FEATURES WITH DIFFERENT DISTANCE TECHN...
FACE RECOGNITION USING DIFFERENT LOCAL FEATURES WITH DIFFERENT DISTANCE TECHN...
 
BIOMETRICS AUTHENTICATION TECHNIQUE FOR INTRUSION DETECTION SYSTEMS USING FIN...
BIOMETRICS AUTHENTICATION TECHNIQUE FOR INTRUSION DETECTION SYSTEMS USING FIN...BIOMETRICS AUTHENTICATION TECHNIQUE FOR INTRUSION DETECTION SYSTEMS USING FIN...
BIOMETRICS AUTHENTICATION TECHNIQUE FOR INTRUSION DETECTION SYSTEMS USING FIN...
 
PERFORMANCE ANALYSIS OF FINGERPRINTING EXTRACTION ALGORITHM IN VIDEO COPY DET...
PERFORMANCE ANALYSIS OF FINGERPRINTING EXTRACTION ALGORITHM IN VIDEO COPY DET...PERFORMANCE ANALYSIS OF FINGERPRINTING EXTRACTION ALGORITHM IN VIDEO COPY DET...
PERFORMANCE ANALYSIS OF FINGERPRINTING EXTRACTION ALGORITHM IN VIDEO COPY DET...
 
Effect of Interleaved FEC Code on Wavelet Based MC-CDMA System with Alamouti ...
Effect of Interleaved FEC Code on Wavelet Based MC-CDMA System with Alamouti ...Effect of Interleaved FEC Code on Wavelet Based MC-CDMA System with Alamouti ...
Effect of Interleaved FEC Code on Wavelet Based MC-CDMA System with Alamouti ...
 
FUZZY WEIGHTED ASSOCIATIVE CLASSIFIER: A PREDICTIVE TECHNIQUE FOR HEALTH CARE...
FUZZY WEIGHTED ASSOCIATIVE CLASSIFIER: A PREDICTIVE TECHNIQUE FOR HEALTH CARE...FUZZY WEIGHTED ASSOCIATIVE CLASSIFIER: A PREDICTIVE TECHNIQUE FOR HEALTH CARE...
FUZZY WEIGHTED ASSOCIATIVE CLASSIFIER: A PREDICTIVE TECHNIQUE FOR HEALTH CARE...
 
GENDER RECOGNITION SYSTEM USING SPEECH SIGNAL
GENDER RECOGNITION SYSTEM USING SPEECH SIGNALGENDER RECOGNITION SYSTEM USING SPEECH SIGNAL
GENDER RECOGNITION SYSTEM USING SPEECH SIGNAL
 
DETECTION OF CONCEALED WEAPONS IN X-RAY IMAGES USING FUZZY K-NN
DETECTION OF CONCEALED WEAPONS IN X-RAY IMAGES USING FUZZY K-NNDETECTION OF CONCEALED WEAPONS IN X-RAY IMAGES USING FUZZY K-NN
DETECTION OF CONCEALED WEAPONS IN X-RAY IMAGES USING FUZZY K-NN
 
META-HEURISTICS BASED ARF OPTIMIZATION FOR IMAGE RETRIEVAL
META-HEURISTICS BASED ARF OPTIMIZATION FOR IMAGE RETRIEVALMETA-HEURISTICS BASED ARF OPTIMIZATION FOR IMAGE RETRIEVAL
META-HEURISTICS BASED ARF OPTIMIZATION FOR IMAGE RETRIEVAL
 
ERROR PERFORMANCE ANALYSIS USING COOPERATIVE CONTENTION-BASED ROUTING IN WIRE...
ERROR PERFORMANCE ANALYSIS USING COOPERATIVE CONTENTION-BASED ROUTING IN WIRE...ERROR PERFORMANCE ANALYSIS USING COOPERATIVE CONTENTION-BASED ROUTING IN WIRE...
ERROR PERFORMANCE ANALYSIS USING COOPERATIVE CONTENTION-BASED ROUTING IN WIRE...
 
M-FISH KARYOTYPING - A NEW APPROACH BASED ON WATERSHED TRANSFORM
M-FISH KARYOTYPING - A NEW APPROACH BASED ON WATERSHED TRANSFORMM-FISH KARYOTYPING - A NEW APPROACH BASED ON WATERSHED TRANSFORM
M-FISH KARYOTYPING - A NEW APPROACH BASED ON WATERSHED TRANSFORM
 
RANDOMIZED STEGANOGRAPHY IN SKIN TONE IMAGES
RANDOMIZED STEGANOGRAPHY IN SKIN TONE IMAGESRANDOMIZED STEGANOGRAPHY IN SKIN TONE IMAGES
RANDOMIZED STEGANOGRAPHY IN SKIN TONE IMAGES
 
A NOVEL WINDOW FUNCTION YIELDING SUPPRESSED MAINLOBE WIDTH AND MINIMUM SIDELO...
A NOVEL WINDOW FUNCTION YIELDING SUPPRESSED MAINLOBE WIDTH AND MINIMUM SIDELO...A NOVEL WINDOW FUNCTION YIELDING SUPPRESSED MAINLOBE WIDTH AND MINIMUM SIDELO...
A NOVEL WINDOW FUNCTION YIELDING SUPPRESSED MAINLOBE WIDTH AND MINIMUM SIDELO...
 
CSHURI – Modified HURI algorithm for Customer Segmentation and Transaction Pr...
CSHURI – Modified HURI algorithm for Customer Segmentation and Transaction Pr...CSHURI – Modified HURI algorithm for Customer Segmentation and Transaction Pr...
CSHURI – Modified HURI algorithm for Customer Segmentation and Transaction Pr...
 
AN EFFICIENT IMPLEMENTATION OF TRACKING USING KALMAN FILTER FOR UNDERWATER RO...
AN EFFICIENT IMPLEMENTATION OF TRACKING USING KALMAN FILTER FOR UNDERWATER RO...AN EFFICIENT IMPLEMENTATION OF TRACKING USING KALMAN FILTER FOR UNDERWATER RO...
AN EFFICIENT IMPLEMENTATION OF TRACKING USING KALMAN FILTER FOR UNDERWATER RO...
 
USING DATA MINING TECHNIQUES FOR DIAGNOSIS AND PROGNOSIS OF CANCER DISEASE
USING DATA MINING TECHNIQUES FOR DIAGNOSIS AND PROGNOSIS OF CANCER DISEASEUSING DATA MINING TECHNIQUES FOR DIAGNOSIS AND PROGNOSIS OF CANCER DISEASE
USING DATA MINING TECHNIQUES FOR DIAGNOSIS AND PROGNOSIS OF CANCER DISEASE
 
FACTORS AFFECTING ACCEPTANCE OF WEB-BASED TRAINING SYSTEM: USING EXTENDED UTA...
FACTORS AFFECTING ACCEPTANCE OF WEB-BASED TRAINING SYSTEM: USING EXTENDED UTA...FACTORS AFFECTING ACCEPTANCE OF WEB-BASED TRAINING SYSTEM: USING EXTENDED UTA...
FACTORS AFFECTING ACCEPTANCE OF WEB-BASED TRAINING SYSTEM: USING EXTENDED UTA...
 
PROBABILISTIC INTERPRETATION OF COMPLEX FUZZY SET
PROBABILISTIC INTERPRETATION OF COMPLEX FUZZY SETPROBABILISTIC INTERPRETATION OF COMPLEX FUZZY SET
PROBABILISTIC INTERPRETATION OF COMPLEX FUZZY SET
 

Recently uploaded

complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...asadnawaz62
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024Mark Billinghurst
 
Biology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxBiology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxDeepakSakkari2
 
EduAI - E learning Platform integrated with AI
EduAI - E learning Platform integrated with AIEduAI - E learning Platform integrated with AI
EduAI - E learning Platform integrated with AIkoyaldeepu123
 
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETEINFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETEroselinkalist12
 
Introduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxIntroduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxk795866
 
Churning of Butter, Factors affecting .
Churning of Butter, Factors affecting  .Churning of Butter, Factors affecting  .
Churning of Butter, Factors affecting .Satyam Kumar
 
Risk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfRisk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfROCENODodongVILLACER
 
Arduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptArduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptSAURABHKUMAR892774
 
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfCCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfAsst.prof M.Gokilavani
 
Internship report on mechanical engineering
Internship report on mechanical engineeringInternship report on mechanical engineering
Internship report on mechanical engineeringmalavadedarshan25
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVRajaP95
 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024hassan khalil
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxJoão Esperancinha
 

Recently uploaded (20)

complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...
 
young call girls in Green Park🔝 9953056974 🔝 escort Service
young call girls in Green Park🔝 9953056974 🔝 escort Serviceyoung call girls in Green Park🔝 9953056974 🔝 escort Service
young call girls in Green Park🔝 9953056974 🔝 escort Service
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024
 
POWER SYSTEMS-1 Complete notes examples
POWER SYSTEMS-1 Complete notes  examplesPOWER SYSTEMS-1 Complete notes  examples
POWER SYSTEMS-1 Complete notes examples
 
Biology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxBiology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptx
 
Design and analysis of solar grass cutter.pdf
Design and analysis of solar grass cutter.pdfDesign and analysis of solar grass cutter.pdf
Design and analysis of solar grass cutter.pdf
 
EduAI - E learning Platform integrated with AI
EduAI - E learning Platform integrated with AIEduAI - E learning Platform integrated with AI
EduAI - E learning Platform integrated with AI
 
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETEINFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
 
Introduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxIntroduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptx
 
Churning of Butter, Factors affecting .
Churning of Butter, Factors affecting  .Churning of Butter, Factors affecting  .
Churning of Butter, Factors affecting .
 
Risk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfRisk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdf
 
Arduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptArduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.ppt
 
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptxExploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
 
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
 
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfCCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
 
Internship report on mechanical engineering
Internship report on mechanical engineeringInternship report on mechanical engineering
Internship report on mechanical engineering
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
 
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Serviceyoung call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
 

Performance Analysis of Parallel Algorithms on Multi-core System using OpenMP

  • 1. International Journal of Computer Science, Engineering and Information Technology (IJCSEIT), Vol.2, No.5, October 2012 DOI : 10.5121/ijcseit.2012.2506 55 Performance Analysis of Parallel Algorithms on Multi-core System using OpenMP Sanjay Kumar Sharma1 , Dr. Kusum Gupta2 1 Departemtne of Computer Science, Banasthali University, Rajasthan, India skumar2.sharma@gmail.com 1 Departemtne of Computer Science, Banasthali University, Rajasthan, India gupta_kusum@yahoo.com Abstract The current multi-core architectures have become popular due to performance, and efficient processing of multiple tasks simultaneously. Today’s the parallel algorithms are focusing on multi-core systems. The design of parallel algorithm and performance measurement is the major issue on multi-core environment. If one wishes to execute a single application faster, then the application must be divided into subtask or threads to deliver desired result. Numerical problems, especially the solution of linear system of equation have many applications in science and engineering. This paper describes and analyzes the parallel algorithms for computing the solution of dense system of linear equations, and to approximately compute the value of π using OpenMP interface. The performances (speedup) of parallel algorithms on multi-core system have been presented. The experimental results on a multi-core processor show that the proposed parallel algorithms achieves good performance (speedup) compared to the sequential. Keyword Multi-core Architecture, OpenMP, Parallel algorithms, Performance analysis. 1. Introduction To see the parallelism in the developed applications, a number of tools available for them. Multithreading is the technique which is allow to execute multiple threads in parallel. The computer architecture has been classified into two categories: instruction level and data level. This classification is given by Flynn’s taxonomy [1]. Performance of parallel application can be achieved using Multi-core technology. Multi-core technology means having more than one core inside a single chip. This opens a way to the parallel computation, where multiple parts of a program are executed in parallel at same time [2]. The factor motivated the design of parallel algorithm for multi-core system is the performance. The performance of parallel algorithm is sensitive to number of cores available in the system, core to core latencies, memory hierarchy design, and synchronization costs. The software development tools must abstract these variations so that software performance continues to obtain the benefits of the Moore’s law. In multi-core environment the sequential computing paradigm is not good and inefficient, while the usual parallel computing may be suitable. One of the most important numerical problems is solution of system of linear equations. Systems of linear equations arise in the science domain
  • 2. International Journal of Computer Science, Engineering and Information Technology (IJCSEIT), Vol.2, No.5, October 2012 56 such as fusion energy, structural engineering and method of moment formulation of Maxwell equation. 2. Related work With the invention of multi-core technology, the parallel algorithm can get the benefits to improve the performance of the application. Multi-core technologies supports multithreading to executing multiple threads in parallel and hence the performance of the applications can be improved. We studied the number of algorithms on multi-core/parallel machines and the performance metrics of numerical algorithms [3], [4].The research on hybrid multi-core and GPU architectures are also emerging trends in the multi-core era [5]. In order to achieve the high performance in the application, we need to develop the correct parallel algorithm, requires the hardware and the language like OpenMP. The OpenMP has the support of multithreading. The program can be developed so that all the processor can be busy to improve the performance. So far all the works discuss about the performance of the algorithms. Our unique approach is to find the performance of some numerical algorithms on Multi-core system using OpenMP programming techniques. 3. Overview of Proposed Work There are some numerical problems which are large and complex; solutions of which takes more time using sequential algorithm on a single processor machine or on multiprocessor machine. The fast solution of these problems can be obtained using parallel algorithms and multi-core system. In this paper we select two numerical problems. The first problem is to approximately compute the value of π using method of integration, and second is solution of system of linear equations [6]. We describe the techniques and algorithm involved in achieving good performance by reducing execution time through OpenMP Parallelization on multi-core. We tested the algorithms by writing the program using OpenMP on multi-core system and measure their performances with their execution times. In our proposed work, we estimate the execution time taken by the programs of sequential and parallel algorithms and also computed the speedup. The schematic diagram of our proposed work for the solution of system of linear equation is shown in Fig.1. Figure 1 Modules of parallel Algorithm No. of linear equations (N) Sequential Execution Parallel Execution Calculate Execution Time Calculate Execution Time Compare and Analyze Results
  • 3. International Journal of Computer Science, Engineering and Information Technology (IJCSEIT), Vol.2, No.5, October 2012 57 4. Programming in OpenMP An OpenMP Application Programming Interface (API) was developed to enable shared memory parallel programming. OpenMP API is the set of compiler directives, library routines, and environment variables to specify shared-memory parallelism in FORTRAN and C/C++ programs [7]. It provides three kinds of directives: parallel work sharing, data environment and synchronization to exploit the multi-core, multithreaded processors. The OpenMP provides means for the programmer to: create teams of thread for parallel execution, specify how to share work among the member of team, declare both shared and private variables, and synchronize threads and enable them to perform certain operations exclusively [7]. OpenMP is based on the fork-and-join execution model, where a program is initialized as a single thread named master thread [7]. This thread is executed sequentially until the first parallel construct is encountered. This construct defines a parallel section (a block which can be executed by a number of threads in parallel). The master thread creates a team of threads that executes the statements concurrently in parallel contained in the parallel section. There is an implicit synchronization at the end of the parallel region, after which only the master thread continues its execution [7]. 3.1 Creating an OpenMP Program OpenMP’s directives can be used in the program which tells the compiler which instructions to execute in parallel and how to distribute them among the threads [7]. The first step in creating parallel program using OpenMP from a sequential one is to identify the parallelism it contains. This requires finding instructions, sequences of instructions, or even large section of code that can be executed concurrently by different processors. This is the important task when one goes to develop the parallel application. The second step in creating an OpenMP program is to express, using OpenMP, the parallelism that has been identified [7]. A huge practical benefit of OpenMP is that it can be applied to incrementally create a parallel program from an existing sequential code. The developer can insert directives into a portion of the program and leave the rest in its sequential form. Once the resulting program version has been successfully compiled and tested, another portion of the code can be parallelized. The programmer can terminate this process once the desired speedup has been obtained [7]. 4. Performance of Parallel Algorithm The amount of performance benefit an application will realize by using OpenMP depends entirely on the extent to which it can be parallelized. Amdahl’s law specifies the maximum speed-up that can be expected by parallelizing portions of a serial program [8]. Essentially, it states that the maximum speed up (S) of a program is S = 1/ (1-F) + (F / N) where, F is the fraction of the total serial execution time taken by the portion of code that can be parallelized and N is the number of processors over which the parallel portion of the code runs. The metric that have been used to evaluate the performance of the parallel algorithm is the speedup [8]. It is defined as Sp = T1 / TP
  • 4. International Journal of Computer Science, Engineering and Information Technology (IJCSEIT), Vol.2, No.5, October 2012 Where, T1 denotes the execution time of the best known sequential algorithm on single processor machine, and Tp is the execution time of parallel algorithm on In other word, speedup refers to how much the parallel algorithm is faster than the c sequential algorithm. The linear or ideal speedup is obtained when Proposed Methodology 5.1 Calculation of π The function can be used to approximate the value of integration. Consider the evaluation of where, f(x) is called the integrand, Integration of function f(x) can be evaluated numerically by splitting interval spaced subintervals. We assume that the intervals are constant and its interval width is Method of integration (Simpson 1/3 rule) have been used to approximately compute the values of π. The formula and its sequential Sequential Algorithm 1. Input N // N no. of intervals 2. h = (b-a) / N 3. sum= f(a) + f(b) + 4*f(a+h) 4. i = 3 to N-1 step +2 do 5. x = 2*f (a+ (i-1) *h + 4 * f (a+ i*h) 6. sum = sum+ x 7. End loop [i] 8. Int = (h/3) * sum 9. Print ‘Integral is = ’, Int 5.2 Solution of System of Linear equations Solution of system of linear equations is assignment of value to variables that satisfy the equations. To solve the system of linear equations, we considered the direct method: elimination. It is a numerical method for solving the system of linear is a known matrix of size n×n, X n. Consider the n linear equation in n unknowns as a11 a21 a31 International Journal of Computer Science, Engineering and Information Technology (IJCSEIT), Vol.2, No.5, October 2012 denotes the execution time of the best known sequential algorithm on single processor is the execution time of parallel algorithm on P processor machine. refers to how much the parallel algorithm is faster than the c sequential algorithm. The linear or ideal speedup is obtained when Sp=P. can be used to approximate the value of π using Consider the evaluation of definite integral is called the integrand, a lower, and b is upper limits of integration. can be evaluated numerically by splitting interval [a, b] into spaced subintervals. We assume that the intervals are constant and its interval width is Method of integration (Simpson 1/3 rule) have been used to approximately compute the values of sequential algorithm are given below. N no. of intervals, a=0 and b=1 are limits sum= f(a) + f(b) + 4*f(a+h) 1) *h + 4 * f (a+ i*h) .2 Solution of System of Linear equations Solution of system of linear equations is assignment of value to variables that satisfy the equations. To solve the system of linear equations, we considered the direct method: . It is a numerical method for solving the system of linear equations AX = B X is the required solution vector, and B is a known vector of size Consider the n linear equation in n unknowns as 1x1 + a12x2 + … + a1nxn = a1,n+1 1x1 + a22x2 + … + a2nxn = a2,n+1 1x1 + a32x2 + … + a3nxn = a3,n+1 International Journal of Computer Science, Engineering and Information Technology (IJCSEIT), Vol.2, No.5, October 2012 58 denotes the execution time of the best known sequential algorithm on single processor processor machine. refers to how much the parallel algorithm is faster than the corresponding using numerical [a, b] into n equally spaced subintervals. We assume that the intervals are constant and its interval width is h= (b-a)/n. Method of integration (Simpson 1/3 rule) have been used to approximately compute the values of Solution of system of linear equations is assignment of value to variables that satisfy the equations. To solve the system of linear equations, we considered the direct method: Gaussian AX = B, where A is a known vector of size
  • 5. International Journal of Computer Science, Engineering and Information Technology (IJCSEIT), Vol.2, No.5, October 2012 59 … … … … … … … … an1x1 + an2x2 + … + annxn = an,n+1, (1) where, ai,j and ai,j+1 are known constants and xi’s are unknowns. The parallel algorithm, which is used to solve dense system of linear equations using Gaussian elimination with partial pivoting, is developed. Gauss method is based on transformation of linear equations, which do not change the solution [9] [10][11]. It includes the transformations: Multiplication of any equation by a nonzero constant, Permutation of equations, Addition of any system of equation to other equation. This algorithm consists of two phases: 1. In the first phase the Pivot element is identified as the largest absolute value among the coefficients in the first column. Then Exchange the first row with the row containing that element. Then eliminate the first variable in the second equation using transformation. When the second row becomes the pivot row, search for the coefficients in the second column from the second row to the nth row and locate the largest coefficient. Exchange the second row with the row containing the largest coefficient. Continue this procedure till (n-1) unknowns are eliminated [11]. 2. The second phase in known as backward substitution phase which is concerned with the actual solution of the equations and uses the back substitution process on the reduced upper triangular system [11]. The sequential algorithm of gauss elimination is given below: Sequential algorithm 1. Input: Given Matrix a [1: n, 1: n+1] 2. Output: x [1: n] 3. // Traingularization process 4. for k = 1 to n-1 5. Find pivot row by searching the row with greatest absolute value among the elements of column k 6. swap the pivot row with row k 7. for i = k+1 to n 8. mi,k = ai,k / ak,k 9. for j = k to n+1 10. ai,j = ai,j – mi,k * ak,j 11. End loop [j] 12. End loop [i] 13. End loop [k] 14. // Back substitution process 15. xn = an,n+1 / an,n 16. for i = n-1 to 1 step -1 do 17. sum = 0 18. for j = i+1 to n do 19. sum = sum + ai,j * xj 20. End loop [j] 21. xi = ( ai,n+1 - sum )/ai,i 22. End loop[i]
  • 6. International Journal of Computer Science, Engineering and Information Technology (IJCSEIT), Vol.2, No.5, October 2012 60 5. Design and Implementation First we study the typical behavior of sequential algorithms and identified the section of operation that can be executed in parallel. In designed parallel algorithm we have used the #pragma directive which shows that the iteration of loop will execute in parallel on different processors. The parallel algorithms for both the problem is given below. We computed the value of Pi using numerical integration. There are six level of parallelization of numerical integration. We have used the most efficient integration formula level to compute the value of definite integral. In the parallel algorithm we inserted the #pragma directive to parallelize the loop. Parallel Algorithm- Computation of Pi Value 1. Input N // N no. of intervals, a=0 and b=1 are limits 2. h = (b-a) / N 3. Start the clock 4. sum= f(a) + f(b) + 4*f(a+h) 5. Insert #pragma directive to parallelize the loop i 6. i = 3 to N-1 step +2 do 7. x = 2*f (a+ (i-1) *h + 4 * f (a+ i*h) 8. sum = sum+ x 9. End loop [i] 10. Int = (h/3) * sum 11. Stop clock 12. Display time taken and vale of Int In the sequential algorithm of Gauss elimination we found that the innermost loops indexed by i, and j can be executed in parallel without affecting the result. In the parallel algorithm, we insert the #pragma directive to parallelize the loops. Parallel algorithm- Gauss elimination 1. Input: a [1: n, 1: n+1] // Read the matrix data 2. Output: x [1: n] 3. Set the numbr of threads 4. Strat clock 5. // Traingularization process 6. for k = 1 to n-1 7. Insert #pragma directive to parallelize 8. for i = k+1 to n 9. mi,k = ai,k / ak,k 10. for j = k to n+1 11. ai,j = ai,j – mi,k * ak,j 12. End loop [j] 13. End loop [i] 14. End loop [k] 15. // Back substitution process 16. xn = an,n+1 / an,n 17. for i = n-1 to 1 step -1 do 18. sum = 0
  • 7. International Journal of Computer Science, Engineering and Information Technology (IJCSEIT), Vol.2, No.5, October 2012 61 19. for j = i+1 to n do 20. sum = sum + ai,j * xj 21. End loop [j] 22. xi = ( ai,n+1 - sum )/ai,i 23. End loop[i] 24. Stop clock 25. Display time taken and solution vector 6. Experimental Results There are two version of algorithm: sequential and parallel. The programs are executed on Intel@Core2-Duo processor machine. We analyzed the performance using results and finally derived the conclusions. The Intel C++ compiler 10.0 under Microsoft Visual Studio 8.0 used for compilations and executions. The Intel C++ compiler supports multithreaded parallelism with /Qopenmp flag. The Origin6.1 software is used to plot the graph using the data obtained by the experiments. In the first experiment the execution times of both the sequential and parallel algorithms have been recorded to measure the performance (speedup) of parallel algorithm against sequential. We used π value from Mathematica to compare the accuracy of computed value. Mathematica is known for its capability to do computations with arbitrary precision. The data presented in Table 1 represents the execution time taken by the sequential and parallel programs, difference between mathematica’s values of π and the speedup. We plot the graph using the data in Table 1 to analyze the performance of parallel algorithm which is shown in fig. 2. The result obtained shows a vast difference in time required to execute the parallel algorithm and time taken by sequential algorithm. The parallel algorithm is approximately twice faster than the sequential. Table 1 Performance comparison of sequential and parallel algorithm to compute the value of π Sr. No. No. of Intervals Sequential Execution Time & Difference between results Parallel Execution Time & Difference between results Performance/ Speed up (S) Time (m. s.) Difference Time (m. s.) Difference 1 1000 0 -1.67*10-7 0 -1.67*10-7 0 2 10000 0 -1.67*10-9 0 -1.67* 10-9 0 3 100000 15 -1.67*10-11 8 -1.67*10-11 1.875 4 1000000 78 -4.44*10-16 40 -2.04*10-16 1.950
  • 8. International Journal of Computer Science, Engineering and Information Technology (IJCSEIT), Vol.2, No.5, October 2012 62 0 1 x 1 0 5 2 x 1 0 5 3 x 1 0 5 4 x 1 0 5 5 x 1 0 5 6 x 1 0 5 7 x 1 0 5 8 x 1 0 5 9 x 1 0 5 1 x 1 0 6 - 5 0 5 1 0 1 5 2 0 2 5 3 0 3 5 4 0 4 5 5 0 5 5 6 0 6 5 7 0 7 5 8 0 8 5 9 0 ExecutionTime(inMilliseconds) N u m b r e o f I n t e r v a l s S e q u e n t ia l A l g o r it h m P a r a lle l A lg o r it h m Figure 2 Execution times of sequential and parallel computation of π In the second experiment, we implemented the sequential and parallel algorithms for finding the solution of system of linear equations. We tested both the algorithms on the equations of different sizes and recorded their execution times. In the program we used the function timeGetTime() to calculate the time taken by sequential and parallel algorithms. The data in Table 2 represent the execution times taken by sequential and parallel programs for the solution of system of linear equations of different sizes, and the speedup. The result shows that the parallel algorithm is efficient than their corresponding sequential algorithm. We plot the graph using data in Table 2 to analyze the performance (speedup) of parallel algorithm which is presented in Fig.3. It shows that the parallel algorithm save significant amounts of execution time and gives more efficient results. The speedup of parallel algorithm on average is approximately twice than their corresponding sequential algorithm. Table 2 Performance comparison of sequential and parallel algorithms No. of Equations Sequential Execution Time (m. s.) Parallel Execution Time (m. s.) Performance / Speedup(S) 100 0 0 0 125 31.5 16.50 1.90 150 46.5 24.25 1.91 175 70.5 36.50 1.93 200 124.5 63.50 1.96 225 183.25 92.25 1.98 250 249.5 125.25 1.990 275 327.5 165.50 1.97 300 379.25 190.00 1.996
  • 9. International Journal of Computer Science, Engineering and Information Technology (IJCSEIT), Vol.2, No.5, October 2012 63 Figure 3. Execution time of sequential and parallel algorithm of Gauss elimination algorithms 7. Conclusions and Future Enhancement In this work we studied how OpenMP programming techniques are beneficial to multi-core system. We also computed the value of Pi and solved linear equations using OpenMP to improve performance by reducing execution time. We also presented the execution time of both serial and parallel algorithm for computation of Pi value and for the solution of system of linear equations. The work has successfully computed the value of Pi and solution of system of linear equation using OpenMP on multi-core machine. Based on our study we arrive at the following conclusions: (1) we see that parallelizing serial algorithm using OpenMP has increased the performance. (2) For multi-core system OpenMP provides a lot of performance increase and parallelization can be done with careful small changes. (3) The parallel algorithm is approximately twice faster than the sequential and the speedup is linear. The future enhancement of this work is highly creditable as parallelization using OpenMP is gaining popularity. This work will be carried out in near future for the real time implementation over a large extent and for the high performance systems. References [1] Noronha R., Panda D.K., “Improving Scalability of OpenMP Applications on Muti-core Systems Using Large Page Support, IEEE Computer, 2007. [2] Kulkarni, S. G., “Analysis of Multi-Core System Performance through OpenMP”, National Conference on Advanced Computing and Communication Technology, IJTES, Vol-1. No.-2, Page. 189-192, July – Sep 2010. [3] Gallivan K. A., Plemmons R. J., and Sameh A. H.,"Parallel algorithms for dense linear algebra computations," SIAM Rev., vol. 32, pp. 54-135, March 1990. [4] Gallivan K. A., Jalby W., Malony A. D., and Wijshoff H. A. G., "Performance prediction for parallel numerical algorithms.," International Journal of High Speed Computing, vol. 3, no. 1, pp. 31-62, 1991. 0 2 5 5 0 7 5 1 0 0 1 2 5 1 5 0 1 7 5 2 0 0 2 2 5 2 5 0 2 7 5 3 0 0 3 2 5 3 5 0 0 . 0 0 2 5 . 0 0 5 0 . 0 0 7 5 . 0 0 1 0 0 . 0 0 1 2 5 . 0 0 1 5 0 . 0 0 1 7 5 . 0 0 2 0 0 . 0 0 2 2 5 . 0 0 2 5 0 . 0 0 2 7 5 . 0 0 3 0 0 . 0 0 3 2 5 . 0 0 3 5 0 . 0 0 3 7 5 . 0 0 4 0 0 . 0 0 4 2 5 . 0 0 ExecutionTime(inMilliseconds) N u m b e r o f E q u a t io n s S e q u e n t ia l A lg o r it h m P a r a lle l A lg o r it h m
  • 10. International Journal of Computer Science, Engineering and Information Technology (IJCSEIT), Vol.2, No.5, October 2012 64 [5] Horton M., Tomov S., and Dongarra J., "A class of hybrid LAPACK algorithms for multi-core and GPU architectures," in Proceedings of the 2011 Symposium on Application Accelerators in High- Performance Computing, SAAHPC ’11, (Washington, DC, USA), pp. 150-158, IEEE Computer Society, 2011. [6] Wilkinson, B., Allen, M., “Parallel Programming”, Pearson Education (Singapore), 2002. [7] Barbara, C., Jost, G., Pas, R.V., “Using OpenMP: portable shared memory parallel programming”, The MIT Press, Cambridge, Massachusetts, London, 2008. [8] Quinn, M. J, “Parallel Programming in C with MPI and OpenMP”, McGraw-Hill Higher Education, 2004. [9] Angadi, Suneeta H., Raju, G. T. & Abhishek, B., Software Performance Analysis with Parallel Programming Approaches , International Journal of Computer Science and Informatics ISSN (PRINT): 2231 –5292, Vol-1, Iss-4, 2012. [10] Bertsekas, Dimitri P., Tsitsiklis, John N.,”Parallel and Distributed Computation: Numerical Methods”, Massanchusetts Institute of Technology, Prentice-Hall, Inc., in 1989 [11] Vijayalakshmi, S., Mohan, R., Mukund, S., Kothari, D.P. ,” LINPACK: Power-Performance Analysis of Multi-Core Processors Using OpenMP, International Journal of Computer Applciation, Vol. 42- No. 1, April 2012.