SlideShare a Scribd company logo
Algorithms
Parallel Algorithms
By: Sandeep Kumar Poonia
Asst. Professor, Jagannath University, Jaipur
1
What is Parallelism in Computers?
Parallelism is a digital computer performing more
than one task at the same time
Examples
• IO chips : Most computers contain special
circuits for IO devices which allow some task to
be performed in parallel
• Pipelining of Instructions : Some cpu's pipeline
the execution of instructions
2
Example………
• Multiple Arithmetic units (AU) : Some CPUs
contain multiple AU so it can perform more
than one arithmetic operation at the same
time.
• We are interested in parallelism involving
more than one CPUs
3
Common Terms for Parallelism
• Concurrent Processing: A program is divided
into multiple processes which are run on a
single processor. The processes are time sliced
on the single processor
• Distributed Processing: A program is divided
into multiple processes which are run on
multiple distinct machines. The multiple
machines are usual connected by a LAN
Machines used typically are workstations
running multiple programs
4
Common Terms for Parallelism….
• Parallel Processing: A program is divided into
multiple processes which are run on multiple
processors. The processors normally:
– are in one machine
– execute one program at a time
– have high speed communications between them
5
Parallel Programming
• Issues in parallel programming not found in
sequential programming
• Task decomposition, allocation and
sequencing
• Breaking down the problem into smaller tasks
(processes) than can be run in parallel
• Allocating the parallel tasks to different
processors
• Sequencing the tasks in the proper order
• Efficiently use the processors
6
Parallel Programming
• Communication of interim results between
processors: The goal is to reduce the cost of
communication between processors. Task
decomposition and allocation affect
communication costs
• Synchronization of processes: Some processes
must wait at predetermined points for results
from other processes.
• Different machine architectures
7
Performance Issues
• Scalability: Using more nodes should allow a job to
run faster, allow a larger job to run in the same time
• Load Balancing: All nodes should have the same
amount of work, Avoid having nodes idle while
others are computing
• Bottlenecks: Communication bottlenecks
• Too many messages are traveling on the same path
• Serial bottlenecks: Communication Message passing
is slower than computation
8
Parallel Machines
Parameters used to describe or classify parallel
computers:
• Type and number of processors
• Processor interconnections
• Global control
• Synchronous vs. asynchronous operation
9
Type and number of processors
• Massively parallel : Computer systems with
thousands of processors
• Ex: Parallel Supercomputers CM-5, Intel
Paragon
• Coarse-grained parallelism : Few (~10)
processor, usually high powered in system
10
Processor interconnections
Parallel computers may be loosely divided into
two groups:
• Shared Memory (or Multiprocessor)
• Message Passing (or Multicomputers)
11
12
A simple parallel algorithm
Adding n numbers in parallel
13
A simple parallel algorithm
• Example for 8 numbers: We start with 4 processors and
each of them adds 2 items in the first step.
• The number of items is halved at every subsequent step.
Hence log n steps are required for adding n numbers.
The processor requirement is O(n) .
We have omitted many details from our description of the algorithm.
• How do we allocate tasks to processors?
• Where is the input stored?
• How do the processors access the input as well as intermediate
results?
We do not ask these questions while designing sequential algorithms.
14
How do we analyze a parallel
algorithm?
A parallel algorithms is analyzed mainly in terms of its
time, processor and work complexities.
• Time complexity T(n) : How many time steps are needed?
• Processor complexity P(n) : How many processors are used?
• Work complexity W(n) : What is the total work done by all
the processors? Hence,
For our example: T(n) = O(log n)
P(n) = O(n)
W(n) = O(n log n)
15
How do we judge efficiency?
• We say A1 is more efficient than A2 if W1(n) = o(W2(n))
regardless of their time complexities.
For example, W1(n) = O(n) and W2(n) = O(n log n)
• Consider two parallel algorithms A1and A2 for the same problem.
A1: W1(n) work in T1(n) time.
A2: W2(n) work in T2(n) time.
If W1(n) and W2(n) are asymptotically the same then A1 is more
efficient than A2 if T1(n) = o(T2(n)).
For example, W1(n) = W2(n) = O(n), but
T1(n) = O(log n), T2(n) = O(log2 n)
16
How do we judge efficiency?
• It is difficult to give a more formal definition of
efficiency.
Consider the following situation.
For A1 , W1(n) = O(n log n) and T1(n) = O(n).
For A2 , W 2(n) = O(n log2 n) and T2(n) = O(log n)
• It is difficult to say which one is the better algorithm.
Though A1 is more efficient in terms of work, A2 runs
much faster.
• Both algorithms are interesting and one may be better
than the other depending on a specific parallel machine.
17
Optimal parallel algorithms
• Consider a problem, and let T(n) be the worst-case time
upper bound on a serial algorithm for an input of length
n.
• Assume also that T(n) is the lower bound for solving the
problem. Hence, we cannot have a better upper bound.
• Consider a parallel algorithm for the same problem that
does W(n) work in Tpar(n) time.
The parallel algorithm is work optimal, if W(n) = O(T(n))
It is work-time-optimal, if Tpar(n) cannot be improved.
18
A simple parallel algorithm
Adding n numbers in parallel
19
A work-optimal algorithm for adding n
numbers
Step 1.
• Use only n/log n processors and assign log n numbers to
each processor.
• Each processor adds log n numbers sequentially in O(log n)
time.
Step 2.
• We have only n/log n numbers left. We now execute our
original algorithm on these n/log n numbers.
• Now T(n) = O(log n)
W(n) = O(n/log n x log n) = O(n)
20
Why is parallel computing important?
• We can justify the importance of parallel computing for
two reasons.
Very large application domains, and
Physical limitations of VLSI circuits
• Though computers are getting faster and faster, user
demands for solving very large problems is growing at a
still faster rate.
• Some examples include weather forecasting, simulation
of protein folding, computational physics etc.
21
Physical limitations of VLSI circuits
• The Pentium III processor uses 180 nano meter (nm) technology, i.e.,
a circuit element like a transistor can be etched within
180 x 10-9 m.
• Pentium IV processor uses 160nm technology.
• Intel has recently trialed processors made by using 65nm
technology.
22
How many transistors can we pack?
• Pentium III has about 42 million transistors and
Pentium IV about 55 million transistors.
• The number of transistors on a chip is approximately
doubling every 18 months (Moore’s Law).
• There are now 100 transistors for every ant on Earth
23
Physical limitations of VLSI circuits
• All semiconductor devices are Si based. It is fairly safe to assume
that a circuit element will take at least a single Si atom.
• The covalent bonding in Si has a bond length approximately 20nm.
• Hence, we will reach the limit of miniaturization very soon.
• The upper bound on the speed of electronic signals is 3 x 108m/sec,
the speed of light.
• Hence, communication between two adjacent transistors will take
approximately 10-18sec.
• If we assume that a floating point operation involves switching of at
least a few thousand transistors, such an operation will take about
10-15sec in the limit.
• Hence, we are looking at 1000 teraflop machines at the peak of this
technology. (TFLOPS, 1012 FLOPS)
1 flop = a floating point operation
This is a very optimistic scenario.
24
Other Problems
• The most difficult problem is to control power dissipation.
• 75 watts is considered a maximum power output of a
processor.
• As we pack more transistors, the power output goes up and
better cooling is necessary.
• Intel cooled its 8 GHz demo processor using liquid Nitrogen !
25
The advantages of parallel computing
• Parallel computing offers the possibility of overcoming such
physical limits by solving problems in parallel.
• In principle, thousands, even millions of processors can be
used to solve a problem in parallel and today’s fastest
parallel computers have already reached teraflop speeds.
• Today’s microprocessors are already using several parallel
processing techniques like instruction level parallelism,
pipelined instruction fetching etc.
• Intel uses hyper threading in Pentium IV mainly because the
processor is clocked at 3 GHz, but the memory bus operates
only at about 400-800 MHz.
26
Problems in parallel computing
• The sequential or uni-processor computing
model is based on von Neumann’s stored
program model.
• A program is written, compiled and stored in
memory and it is executed by bringing one
instruction at a time to the CPU.
27
Problems in parallel computing
• Programs are written keeping this model in mind.
Hence, there is a close match between the software
and the hardware on which it runs.
• The theoretical RAM model captures these concepts
nicely.
• There are many different models of parallel computing
and each model is programmed in a different way.
• Hence an algorithm designer has to keep in mind a
specific model for designing an algorithm.
• Most parallel machines are suitable for solving specific
types of problems.
• Designing operating systems is also a major issue.
28
The PRAM model
n processors are connected to a shared memory.
29
The PRAM model
• Each processor should be able to access any
memory location in each clock cycle.
• Hence, there may be conflicts in memory
access. Also, memory management hardware
needs to be very complex.
• We need some kind of hardware to connect
the processors to individual locations in the
shared memory.
30
The PRAM model
A more realistic PRAM model
Models of parallel computation
Parallel computational models can be
broadly classified into two categories,
• Single Instruction Multiple Data (SIMD)
• Multiple Instruction Multiple Data (MIMD)
31
Models of parallel computation
• SIMD models are used for solving
problems which have regular structures.
We will mainly study SIMD models in this
course.
• MIMD models are more general and used
for solving problems which lack regular
structures.
32
SIMD models
An N- processor SIMD computer has the
following characteristics :
• Each processor can store both program
and data in its local memory.
• Each processor stores an identical copy
of the same program in its local memory.
33
SIMD models
• At each clock cycle, each processor
executes the same instruction from this
program. However, the data are different
in different processors.
• The processors communicate among
themselves either through an
interconnection network or through a
shared memory.
34
Design issues for network
SIMD models
• A network SIMD model is a graph. The
nodes of the graph are the processors
and the edges are the links between the
processors.
• Since each processor solves only a small
part of the overall problem, it is necessary
that processors communicate with each
other while solving the overall problem.
35
Design issues for network
SIMD models
• The main design issues for network SIMD
models are communication diameter,
bisection width, and scalability.
• We will discuss two most popular network
models, mesh and hypercube in this
lecture.
36
Communication diameter
• Communication diameter is the diameter
of the graph that represents the network
model. The diameter of a graph is the
longest distance between a pair of nodes.
• If the diameter for a model is d, the lower
bound for any computation on that model
is Ω(d).
37
Communication diameter
• The data can be distributed in such a way
that the two furthest nodes may need to
communicate.
38
Communication diameter
Communication between two furthest
nodes takes Ω(d) time steps.
39
Bisection width
• The bisection width of a network model is
the number of links to be removed to
decompose the graph into two equal
parts.
• If the bisection width is large, more
information can be exchanged between
the two halves of the graph and hence
problems can be solved faster.
40
Dividing the graph into two parts.
Bisection width
41
Scalability
• A network model must be scalable so that
more processors can be easily added
when new resources are available.
• The model should be regular so that each
processor has a small number of links
incident on it.
42
Scalability
• If the number of links is large for each
processor, it is difficult to add new
processors as too many new links have to
be added.
• If we want to keep the diameter small, we
need more links per processor. If we want
our model to be scalable, we need less
links per processor.
43
Diameter and Scalability
• The best model in terms of diameter is the
complete graph. The diameter is 1.
However, if we need to add a new node to
an n-processor machine, we need n - 1
new links.
44
Diameter and Scalability
• The best model in terms of scalability is
the linear array. We need to add only one
link for a new processor. However, the
diameter is n for a machine with n
processors.
45
The mesh architecture
• Each internal processor of a 2-dimensional
mesh is connected to 4 neighbors.
• When we combine two different meshes,
only the processors on the boundary need
extra links. Hence it is highly scalable.
46
• Both the diameter and bisection width of an
n-processor, 2-dimensional mesh is
A 4 x 4 mesh
The mesh architecture
( )O n
47
Hypercubes of 0, 1, 2 and 3 dimensions
The hypercube architecture
48
• The diameter of a d-dimensional
hypercube is d as we need to flip at most d
bits (traverse d links) to reach one
processor from another.
• The bisection width of a d-dimensional
hypercube is 2d-1.
The hypercube architecture
49
• The hypercube is a highly scalable
architecture. Two d-dimensional
hypercubes can be easily combined to
form a d+1-dimensional hypercube.
• The hypercube has several variants like
butterfly, shuffle-exchange network and
cube-connected cycles.
The hypercube architecture
50
Adding n numbers in steps
Adding n numbers on the mesh
n 51
52
Adding n numbers in log n steps
Adding n numbers on the
hypercube
53
54
55
Complexity Analysis: Given n processors
connected via a hypercube, S_Sum_Hypercube needs
log n rounds to compute the sum. Since n messages
are sent and received in each round, the total number of
messages is O(n log n).
1. Time complexity: O(log n).
2. Message complexity: O(n log n).
Classification of the PRAM model
• In the PRAM model, processors
communicate by reading from and writing
to the shared memory locations.
56
Classification of the PRAM
model
• The power of a PRAM depends on the
kind of access to the shared memory
locations.
57
Classification of the PRAM
model
In every clock cycle,
• In the Exclusive Read Exclusive Write
(EREW) PRAM, each memory location
can be accessed only by one processor.
• In the Concurrent Read Exclusive Write
(CREW) PRAM, multiple processor can
read from the same memory location, but
only one processor can write.
58
Classification of the PRAM
model
• In the Concurrent Read Concurrent Write
(CRCW) PRAM, multiple processor can
read from or write to the same memory
location.
59
Classification of the PRAM
model
• It is easy to allow concurrent reading.
However, concurrent writing gives rise to
conflicts.
• If multiple processors write to the same
memory location simultaneously, it is not
clear what is written to the memory
location.
60
Classification of the PRAM
model
• In the Common CRCW PRAM, all the
processors must write the same value.
• In the Arbitrary CRCW PRAM, one of the
processors arbitrarily succeeds in writing.
• In the Priority CRCW PRAM, processors
have priorities associated with them and
the highest priority processor succeeds in
writing.
61
Classification of the PRAM
model
• The EREW PRAM is the weakest and the
Priority CRCW PRAM is the strongest
PRAM model.
• The relative powers of the different PRAM
models are as follows.
62
Classification of the PRAM
model
• An algorithm designed for a weaker
model can be executed within the same
time and work complexities on a
stronger model.
63
Classification of the PRAM
model
• We say model A is less powerful
compared to model B if either:
• the time complexity for solving a
problem is asymptotically less in model
B as compared to model A. or,
• if the time complexities are the same,
the processor or work complexity is
asymptotically less in model B as
compared to model A. 64
Classification of the PRAM
model
An algorithm designed for a stronger PRAM
model can be simulated on a weaker model
either with asymptotically more processors
(work) or with asymptotically more time.
65
Adding n numbers on a PRAM
Adding n numbers on a PRAM
66
Adding n numbers on a PRAM
• This algorithm works on the EREW PRAM
model as there are no read or write
conflicts.
• We will use this algorithm to design a
matrix multiplication algorithm on the
EREW PRAM.
67
For simplicity, we assume that n = 2p for some integer p.
Matrix multiplication
68
Matrix multiplication
• Each can be computed in
parallel.
• We allocate n processors for computing ci,j.
Suppose these processors are P1, P2,…,Pn.
• In the first time step, processor
computes the product ai,m x bm,j.
• We have now n numbers and we use the
addition algorithm to sum these n numbers
in log n time.
, , 1 ,i jc i j n
, 1mP m n
69
Matrix multiplication
• Computing each takes n
processors and log n time.
• Since there are n2 such ci,j s, we need
overall O(n3) processors and O(log n)
time.
• The processor requirement can be
reduced to O(n3 / log n). Exercise !
• Hence, the work complexity is O(n3)
, , 1 ,i jc i j n
70
Matrix multiplication
• However, this algorithm requires
concurrent read capability.
• Note that, each element ai,j (and bi,j)
participates in computing n elements from
the C matrix.
• Hence n different processors will try to
read each ai,j (and bi,j) in our algorithm.
71
For simplicity, we assume that n = 2p for some integer p.
Matrix multiplication
72
Matrix multiplication
• Hence our algorithm runs on the CREW
PRAM and we need to avoid the read
conflicts to make it run on the EREW
PRAM.
• We will create n copies of each of the
elements ai,j (and bi,j). Then one copy can
be used for computing each ci,j .
73
Matrix multiplication
Creating n copies of a number in O (log n)
time using O (n) processors on the EREW
PRAM.
• In the first step, one processor reads the
number and creates a copy. Hence, there
are two copies now.
• In the second step, two processors read
these two copies and create four copies.
74
Matrix multiplication
• Since the number of copies doubles in
every step, n copies are created in O(log
n) steps.
• Though we need n processors, the
processor requirement can be reduced to
O (n / log n).
75
Matrix multiplication
• Since there are n2 elements in the matrix A
(and in B), we need O (n3 / log n)
processors and O (log n) time to create n
copies of each element.
• After this, there are no read conflicts in our
algorithm. The overall matrix multiplication
algorithm now take O (log n) time and
O (n3 / log n) processors on the EREW
PRAM.
76
Matrix multiplication
• The memory requirement is of course
much higher for the EREW PRAM.
77
78
Using n3 Processors
Algorithm MatMult_CREW
/* Step 1 */
Forall Pi,j,k, where do in parallel
C[i,j,k] = A[i,k]*B[k,j]
endfor
/* Step 2 */
For I =1 to log n do
forall Pi,j,k, where do in parallel
if (2k modulo 2l)=0 then
C[i,j,2k] C[i,j,2k] + C[i,j, 2k – 2i-1]
endif
endfor
/* The output matrix is stored in locations C[i,j,n], where */
endfor
79
Complexity Analysis
•In the first step, the products are conducted in parallel
in constant time, that is, O(1).
•These products are summed in O(log n) time during
the second step. Therefore, the run time is O(log n).
•Since the number of processors used is n3, the cost is
O(n3 log n).
1. Run time, T(n) = O(log n).
2. Number of processors, P(n) = n3.
3. Cost, C(n) = O(n3 log n).
80
Reducing the Number of Processors
In the above algorithm, although
all the processors were busy during the first step,
But not all of them performed addition operations during the
second step.
 The second step consists of log n iterations.
During the first iteration, only n3/2 processors performed
addition operations,
only n3/4 performed addition operations in the second
iteration, and so on.
With this understanding, we may be able to use a smaller
machine with only n3/log n processors.
81
Reducing the Number of Processors
1. Each processor Pi,j,k , where
computes the sum of log n products. This
step will produce (n3/log n) partial sums.
2. The sum of products produced in step 1 are
added to produce the resulting matrix as
discussed before.
82
Complexity Analysis
1. Run time, T(n) = O(log n).
2. Number of processors, P(n) = n3/log n.
3. Cost, C(n) = O(n3).

More Related Content

What's hot

distributed shared memory
 distributed shared memory distributed shared memory
distributed shared memoryAshish Kumar
 
Parallel computing persentation
Parallel computing persentationParallel computing persentation
Parallel computing persentation
VIKAS SINGH BHADOURIA
 
Introduction to Parallel and Distributed Computing
Introduction to Parallel and Distributed ComputingIntroduction to Parallel and Distributed Computing
Introduction to Parallel and Distributed Computing
Sayed Chhattan Shah
 
Parallel Programming
Parallel ProgrammingParallel Programming
Parallel ProgrammingUday Sharma
 
Feng’s classification
Feng’s classificationFeng’s classification
Feng’s classification
Narayan Kandel
 
Inter Process Communication Presentation[1]
Inter Process Communication Presentation[1]Inter Process Communication Presentation[1]
Inter Process Communication Presentation[1]Ravindra Raju Kolahalam
 
Daa notes 1
Daa notes 1Daa notes 1
Daa notes 1
smruti sarangi
 
Graph coloring using backtracking
Graph coloring using backtrackingGraph coloring using backtracking
Graph coloring using backtracking
shashidharPapishetty
 
Disk scheduling
Disk schedulingDisk scheduling
Disk scheduling
NEERAJ BAGHEL
 
Load Balancing In Distributed Computing
Load Balancing In Distributed ComputingLoad Balancing In Distributed Computing
Load Balancing In Distributed Computing
Richa Singh
 
INTER PROCESS COMMUNICATION (IPC).pptx
INTER PROCESS COMMUNICATION (IPC).pptxINTER PROCESS COMMUNICATION (IPC).pptx
INTER PROCESS COMMUNICATION (IPC).pptx
LECO9
 
Parallel algorithms
Parallel algorithmsParallel algorithms
Parallel algorithms
Danish Javed
 
Multivector and multiprocessor
Multivector and multiprocessorMultivector and multiprocessor
Multivector and multiprocessorKishan Panara
 
Lecture 3 parallel programming platforms
Lecture 3   parallel programming platformsLecture 3   parallel programming platforms
Lecture 3 parallel programming platforms
Vajira Thambawita
 
program flow mechanisms, advanced computer architecture
program flow mechanisms, advanced computer architectureprogram flow mechanisms, advanced computer architecture
program flow mechanisms, advanced computer architecture
Pankaj Kumar Jain
 
Flynns classification
Flynns classificationFlynns classification
Flynns classification
Yasir Khan
 
Resource management
Resource managementResource management
Resource management
Dr Sandeep Kumar Poonia
 
6.Distributed Operating Systems
6.Distributed Operating Systems6.Distributed Operating Systems
6.Distributed Operating Systems
Dr Sandeep Kumar Poonia
 
system interconnect architectures in ACA
system interconnect architectures in ACAsystem interconnect architectures in ACA
system interconnect architectures in ACA
Pankaj Kumar Jain
 

What's hot (20)

Parallel Algorithms
Parallel AlgorithmsParallel Algorithms
Parallel Algorithms
 
distributed shared memory
 distributed shared memory distributed shared memory
distributed shared memory
 
Parallel computing persentation
Parallel computing persentationParallel computing persentation
Parallel computing persentation
 
Introduction to Parallel and Distributed Computing
Introduction to Parallel and Distributed ComputingIntroduction to Parallel and Distributed Computing
Introduction to Parallel and Distributed Computing
 
Parallel Programming
Parallel ProgrammingParallel Programming
Parallel Programming
 
Feng’s classification
Feng’s classificationFeng’s classification
Feng’s classification
 
Inter Process Communication Presentation[1]
Inter Process Communication Presentation[1]Inter Process Communication Presentation[1]
Inter Process Communication Presentation[1]
 
Daa notes 1
Daa notes 1Daa notes 1
Daa notes 1
 
Graph coloring using backtracking
Graph coloring using backtrackingGraph coloring using backtracking
Graph coloring using backtracking
 
Disk scheduling
Disk schedulingDisk scheduling
Disk scheduling
 
Load Balancing In Distributed Computing
Load Balancing In Distributed ComputingLoad Balancing In Distributed Computing
Load Balancing In Distributed Computing
 
INTER PROCESS COMMUNICATION (IPC).pptx
INTER PROCESS COMMUNICATION (IPC).pptxINTER PROCESS COMMUNICATION (IPC).pptx
INTER PROCESS COMMUNICATION (IPC).pptx
 
Parallel algorithms
Parallel algorithmsParallel algorithms
Parallel algorithms
 
Multivector and multiprocessor
Multivector and multiprocessorMultivector and multiprocessor
Multivector and multiprocessor
 
Lecture 3 parallel programming platforms
Lecture 3   parallel programming platformsLecture 3   parallel programming platforms
Lecture 3 parallel programming platforms
 
program flow mechanisms, advanced computer architecture
program flow mechanisms, advanced computer architectureprogram flow mechanisms, advanced computer architecture
program flow mechanisms, advanced computer architecture
 
Flynns classification
Flynns classificationFlynns classification
Flynns classification
 
Resource management
Resource managementResource management
Resource management
 
6.Distributed Operating Systems
6.Distributed Operating Systems6.Distributed Operating Systems
6.Distributed Operating Systems
 
system interconnect architectures in ACA
system interconnect architectures in ACAsystem interconnect architectures in ACA
system interconnect architectures in ACA
 

Viewers also liked

Parallel Algorithms Advantages and Disadvantages
Parallel Algorithms Advantages and DisadvantagesParallel Algorithms Advantages and Disadvantages
Parallel Algorithms Advantages and Disadvantages
Murtadha Alsabbagh
 
Parallel Algorithm Models
Parallel Algorithm ModelsParallel Algorithm Models
Parallel Algorithm Models
Martin Coronel
 
Parallel computing
Parallel computingParallel computing
Parallel computing
Vinay Gupta
 
Randomized Algorithm
Randomized AlgorithmRandomized Algorithm
Randomized Algorithm
Kanishka Khandelwal
 
Parallel Computing
Parallel ComputingParallel Computing
Parallel Computing
Ameya Waghmare
 
Applications of paralleL processing
Applications of paralleL processingApplications of paralleL processing
Applications of paralleL processing
Page Maker
 
Introduction to parallel processing
Introduction to parallel processingIntroduction to parallel processing
Introduction to parallel processing
Page Maker
 
Parallel sorting algorithm
Parallel sorting algorithmParallel sorting algorithm
Parallel sorting algorithm
Richa Kumari
 
Hypercubes In Hbase
Hypercubes In HbaseHypercubes In Hbase
Hypercubes In HbaseGeorge Ang
 
Analysis and design of a half hypercube interconnection network topology
Analysis and design of a half hypercube interconnection network topologyAnalysis and design of a half hypercube interconnection network topology
Analysis and design of a half hypercube interconnection network topology
Amir Masoud Sefidian
 
Broadcast in Hypercube
Broadcast in HypercubeBroadcast in Hypercube
Broadcast in Hypercube
Sujith Jay Nair
 
Linked Data Hypercubes
Linked Data HypercubesLinked Data Hypercubes
Linked Data Hypercubes
Dave Reynolds
 
Chapter - 04 Basic Communication Operation
Chapter - 04 Basic Communication OperationChapter - 04 Basic Communication Operation
Chapter - 04 Basic Communication Operation
Nifras Ismail
 
Part IV E Commerce Course Power Point
Part IV E Commerce Course Power PointPart IV E Commerce Course Power Point
Part IV E Commerce Course Power Point
Daniel Bond
 
Part III E Commerce Course Power Point
Part III E Commerce Course Power PointPart III E Commerce Course Power Point
Part III E Commerce Course Power Point
Daniel Bond
 
Scalable Algorithm Design with MapReduce
Scalable Algorithm Design with MapReduceScalable Algorithm Design with MapReduce
Scalable Algorithm Design with MapReduce
Pietro Michiardi
 
PRAM algorithms from deepika
PRAM algorithms from deepikaPRAM algorithms from deepika
PRAM algorithms from deepika
guest1f4fb3
 
Basics of ecommerce part1
Basics of ecommerce part1Basics of ecommerce part1
Basics of ecommerce part1
Madhav Suratkar
 

Viewers also liked (20)

Parallel Algorithms Advantages and Disadvantages
Parallel Algorithms Advantages and DisadvantagesParallel Algorithms Advantages and Disadvantages
Parallel Algorithms Advantages and Disadvantages
 
Parallel Algorithm Models
Parallel Algorithm ModelsParallel Algorithm Models
Parallel Algorithm Models
 
Parallel computing
Parallel computingParallel computing
Parallel computing
 
Randomized Algorithm
Randomized AlgorithmRandomized Algorithm
Randomized Algorithm
 
Parallel Computing
Parallel ComputingParallel Computing
Parallel Computing
 
Applications of paralleL processing
Applications of paralleL processingApplications of paralleL processing
Applications of paralleL processing
 
Introduction to parallel processing
Introduction to parallel processingIntroduction to parallel processing
Introduction to parallel processing
 
Parallel sorting algorithm
Parallel sorting algorithmParallel sorting algorithm
Parallel sorting algorithm
 
Hypercubes In Hbase
Hypercubes In HbaseHypercubes In Hbase
Hypercubes In Hbase
 
Analysis and design of a half hypercube interconnection network topology
Analysis and design of a half hypercube interconnection network topologyAnalysis and design of a half hypercube interconnection network topology
Analysis and design of a half hypercube interconnection network topology
 
Broadcast in Hypercube
Broadcast in HypercubeBroadcast in Hypercube
Broadcast in Hypercube
 
Parallel Algorithms
Parallel AlgorithmsParallel Algorithms
Parallel Algorithms
 
Linked Data Hypercubes
Linked Data HypercubesLinked Data Hypercubes
Linked Data Hypercubes
 
Paralell
ParalellParalell
Paralell
 
Chapter - 04 Basic Communication Operation
Chapter - 04 Basic Communication OperationChapter - 04 Basic Communication Operation
Chapter - 04 Basic Communication Operation
 
Part IV E Commerce Course Power Point
Part IV E Commerce Course Power PointPart IV E Commerce Course Power Point
Part IV E Commerce Course Power Point
 
Part III E Commerce Course Power Point
Part III E Commerce Course Power PointPart III E Commerce Course Power Point
Part III E Commerce Course Power Point
 
Scalable Algorithm Design with MapReduce
Scalable Algorithm Design with MapReduceScalable Algorithm Design with MapReduce
Scalable Algorithm Design with MapReduce
 
PRAM algorithms from deepika
PRAM algorithms from deepikaPRAM algorithms from deepika
PRAM algorithms from deepika
 
Basics of ecommerce part1
Basics of ecommerce part1Basics of ecommerce part1
Basics of ecommerce part1
 

Similar to Parallel Algorithms

Nbvtalkatjntuvizianagaram
NbvtalkatjntuvizianagaramNbvtalkatjntuvizianagaram
Nbvtalkatjntuvizianagaram
Nagasuri Bala Venkateswarlu
 
Chap5 slides
Chap5 slidesChap5 slides
Chap5 slides
BaliThorat1
 
Parallel Computing-Part-1.pptx
Parallel Computing-Part-1.pptxParallel Computing-Part-1.pptx
Parallel Computing-Part-1.pptx
krnaween
 
cs1311lecture25wdl.ppt
cs1311lecture25wdl.pptcs1311lecture25wdl.ppt
cs1311lecture25wdl.ppt
FannyBellows
 
ICS 2410.Parallel.Sytsems.Lecture.Week 3.week5.pptx
ICS 2410.Parallel.Sytsems.Lecture.Week 3.week5.pptxICS 2410.Parallel.Sytsems.Lecture.Week 3.week5.pptx
ICS 2410.Parallel.Sytsems.Lecture.Week 3.week5.pptx
johnsmith96441
 
Parallel Computing - Lec 5
Parallel Computing - Lec 5Parallel Computing - Lec 5
Parallel Computing - Lec 5
Shah Zaib
 
Unit - 5 Pipelining.pptx
Unit - 5 Pipelining.pptxUnit - 5 Pipelining.pptx
Unit - 5 Pipelining.pptx
Medicaps University
 
CSA unit5.pptx
CSA unit5.pptxCSA unit5.pptx
CSA unit5.pptx
AbcvDef
 
12. Parallel Algorithms.pptx
12. Parallel Algorithms.pptx12. Parallel Algorithms.pptx
12. Parallel Algorithms.pptx
MohAlyasin1
 
Matrix multiplication
Matrix multiplicationMatrix multiplication
Matrix multiplication
International Islamic University
 
unit 2 hpc.pptx
unit 2 hpc.pptxunit 2 hpc.pptx
unit 2 hpc.pptx
gopal467344
 
Multicore_Architecture Book.pdf
Multicore_Architecture Book.pdfMulticore_Architecture Book.pdf
Multicore_Architecture Book.pdf
SwatantraPrakash5
 
Report on High Performance Computing
Report on High Performance ComputingReport on High Performance Computing
Report on High Performance Computing
Prateek Sarangi
 
Platform Technology (2).pdf
Platform Technology (2).pdfPlatform Technology (2).pdf
Platform Technology (2).pdf
FranzLawrenzDeTorres1
 
Pipelining slides
Pipelining slides Pipelining slides
Pipelining slides
PrasantaKumarDash2
 
Coa.ppt2
Coa.ppt2Coa.ppt2
Parallel Processors (SIMD)
Parallel Processors (SIMD) Parallel Processors (SIMD)
Parallel Processors (SIMD)
Ali Raza
 
Parallel Processors (SIMD)
Parallel Processors (SIMD) Parallel Processors (SIMD)
Parallel Processors (SIMD)
Ali Raza
 

Similar to Parallel Algorithms (20)

Lecture1
Lecture1Lecture1
Lecture1
 
Nbvtalkatjntuvizianagaram
NbvtalkatjntuvizianagaramNbvtalkatjntuvizianagaram
Nbvtalkatjntuvizianagaram
 
Chap5 slides
Chap5 slidesChap5 slides
Chap5 slides
 
Parallel Computing-Part-1.pptx
Parallel Computing-Part-1.pptxParallel Computing-Part-1.pptx
Parallel Computing-Part-1.pptx
 
cs1311lecture25wdl.ppt
cs1311lecture25wdl.pptcs1311lecture25wdl.ppt
cs1311lecture25wdl.ppt
 
ICS 2410.Parallel.Sytsems.Lecture.Week 3.week5.pptx
ICS 2410.Parallel.Sytsems.Lecture.Week 3.week5.pptxICS 2410.Parallel.Sytsems.Lecture.Week 3.week5.pptx
ICS 2410.Parallel.Sytsems.Lecture.Week 3.week5.pptx
 
Parallel Computing - Lec 5
Parallel Computing - Lec 5Parallel Computing - Lec 5
Parallel Computing - Lec 5
 
Unit - 5 Pipelining.pptx
Unit - 5 Pipelining.pptxUnit - 5 Pipelining.pptx
Unit - 5 Pipelining.pptx
 
CSA unit5.pptx
CSA unit5.pptxCSA unit5.pptx
CSA unit5.pptx
 
Lecture3
Lecture3Lecture3
Lecture3
 
12. Parallel Algorithms.pptx
12. Parallel Algorithms.pptx12. Parallel Algorithms.pptx
12. Parallel Algorithms.pptx
 
Matrix multiplication
Matrix multiplicationMatrix multiplication
Matrix multiplication
 
unit 2 hpc.pptx
unit 2 hpc.pptxunit 2 hpc.pptx
unit 2 hpc.pptx
 
Multicore_Architecture Book.pdf
Multicore_Architecture Book.pdfMulticore_Architecture Book.pdf
Multicore_Architecture Book.pdf
 
Report on High Performance Computing
Report on High Performance ComputingReport on High Performance Computing
Report on High Performance Computing
 
Platform Technology (2).pdf
Platform Technology (2).pdfPlatform Technology (2).pdf
Platform Technology (2).pdf
 
Pipelining slides
Pipelining slides Pipelining slides
Pipelining slides
 
Coa.ppt2
Coa.ppt2Coa.ppt2
Coa.ppt2
 
Parallel Processors (SIMD)
Parallel Processors (SIMD) Parallel Processors (SIMD)
Parallel Processors (SIMD)
 
Parallel Processors (SIMD)
Parallel Processors (SIMD) Parallel Processors (SIMD)
Parallel Processors (SIMD)
 

More from Dr Sandeep Kumar Poonia

Soft computing
Soft computingSoft computing
Soft computing
Dr Sandeep Kumar Poonia
 
An improved memetic search in artificial bee colony algorithm
An improved memetic search in artificial bee colony algorithmAn improved memetic search in artificial bee colony algorithm
An improved memetic search in artificial bee colony algorithm
Dr Sandeep Kumar Poonia
 
Modified position update in spider monkey optimization algorithm
Modified position update in spider monkey optimization algorithmModified position update in spider monkey optimization algorithm
Modified position update in spider monkey optimization algorithm
Dr Sandeep Kumar Poonia
 
Enhanced local search in artificial bee colony algorithm
Enhanced local search in artificial bee colony algorithmEnhanced local search in artificial bee colony algorithm
Enhanced local search in artificial bee colony algorithm
Dr Sandeep Kumar Poonia
 
RMABC
RMABCRMABC
Memetic search in differential evolution algorithm
Memetic search in differential evolution algorithmMemetic search in differential evolution algorithm
Memetic search in differential evolution algorithm
Dr Sandeep Kumar Poonia
 
Improved onlooker bee phase in artificial bee colony algorithm
Improved onlooker bee phase in artificial bee colony algorithmImproved onlooker bee phase in artificial bee colony algorithm
Improved onlooker bee phase in artificial bee colony algorithm
Dr Sandeep Kumar Poonia
 
Comparative study of_hybrids_of_artificial_bee_colony_algorithm
Comparative study of_hybrids_of_artificial_bee_colony_algorithmComparative study of_hybrids_of_artificial_bee_colony_algorithm
Comparative study of_hybrids_of_artificial_bee_colony_algorithm
Dr Sandeep Kumar Poonia
 
A novel hybrid crossover based abc algorithm
A novel hybrid crossover based abc algorithmA novel hybrid crossover based abc algorithm
A novel hybrid crossover based abc algorithm
Dr Sandeep Kumar Poonia
 
Multiplication of two 3 d sparse matrices using 1d arrays and linked lists
Multiplication of two 3 d sparse matrices using 1d arrays and linked listsMultiplication of two 3 d sparse matrices using 1d arrays and linked lists
Multiplication of two 3 d sparse matrices using 1d arrays and linked lists
Dr Sandeep Kumar Poonia
 
Sunzip user tool for data reduction using huffman algorithm
Sunzip user tool for data reduction using huffman algorithmSunzip user tool for data reduction using huffman algorithm
Sunzip user tool for data reduction using huffman algorithm
Dr Sandeep Kumar Poonia
 
New Local Search Strategy in Artificial Bee Colony Algorithm
New Local Search Strategy in Artificial Bee Colony Algorithm New Local Search Strategy in Artificial Bee Colony Algorithm
New Local Search Strategy in Artificial Bee Colony Algorithm
Dr Sandeep Kumar Poonia
 
A new approach of program slicing
A new approach of program slicingA new approach of program slicing
A new approach of program slicing
Dr Sandeep Kumar Poonia
 
Performance evaluation of different routing protocols in wsn using different ...
Performance evaluation of different routing protocols in wsn using different ...Performance evaluation of different routing protocols in wsn using different ...
Performance evaluation of different routing protocols in wsn using different ...Dr Sandeep Kumar Poonia
 
Enhanced abc algo for tsp
Enhanced abc algo for tspEnhanced abc algo for tsp
Enhanced abc algo for tsp
Dr Sandeep Kumar Poonia
 
Performance evaluation of diff routing protocols in wsn using difft network p...
Performance evaluation of diff routing protocols in wsn using difft network p...Performance evaluation of diff routing protocols in wsn using difft network p...
Performance evaluation of diff routing protocols in wsn using difft network p...
Dr Sandeep Kumar Poonia
 

More from Dr Sandeep Kumar Poonia (20)

Soft computing
Soft computingSoft computing
Soft computing
 
An improved memetic search in artificial bee colony algorithm
An improved memetic search in artificial bee colony algorithmAn improved memetic search in artificial bee colony algorithm
An improved memetic search in artificial bee colony algorithm
 
Modified position update in spider monkey optimization algorithm
Modified position update in spider monkey optimization algorithmModified position update in spider monkey optimization algorithm
Modified position update in spider monkey optimization algorithm
 
Enhanced local search in artificial bee colony algorithm
Enhanced local search in artificial bee colony algorithmEnhanced local search in artificial bee colony algorithm
Enhanced local search in artificial bee colony algorithm
 
RMABC
RMABCRMABC
RMABC
 
Memetic search in differential evolution algorithm
Memetic search in differential evolution algorithmMemetic search in differential evolution algorithm
Memetic search in differential evolution algorithm
 
Improved onlooker bee phase in artificial bee colony algorithm
Improved onlooker bee phase in artificial bee colony algorithmImproved onlooker bee phase in artificial bee colony algorithm
Improved onlooker bee phase in artificial bee colony algorithm
 
Comparative study of_hybrids_of_artificial_bee_colony_algorithm
Comparative study of_hybrids_of_artificial_bee_colony_algorithmComparative study of_hybrids_of_artificial_bee_colony_algorithm
Comparative study of_hybrids_of_artificial_bee_colony_algorithm
 
A novel hybrid crossover based abc algorithm
A novel hybrid crossover based abc algorithmA novel hybrid crossover based abc algorithm
A novel hybrid crossover based abc algorithm
 
Multiplication of two 3 d sparse matrices using 1d arrays and linked lists
Multiplication of two 3 d sparse matrices using 1d arrays and linked listsMultiplication of two 3 d sparse matrices using 1d arrays and linked lists
Multiplication of two 3 d sparse matrices using 1d arrays and linked lists
 
Sunzip user tool for data reduction using huffman algorithm
Sunzip user tool for data reduction using huffman algorithmSunzip user tool for data reduction using huffman algorithm
Sunzip user tool for data reduction using huffman algorithm
 
New Local Search Strategy in Artificial Bee Colony Algorithm
New Local Search Strategy in Artificial Bee Colony Algorithm New Local Search Strategy in Artificial Bee Colony Algorithm
New Local Search Strategy in Artificial Bee Colony Algorithm
 
A new approach of program slicing
A new approach of program slicingA new approach of program slicing
A new approach of program slicing
 
Performance evaluation of different routing protocols in wsn using different ...
Performance evaluation of different routing protocols in wsn using different ...Performance evaluation of different routing protocols in wsn using different ...
Performance evaluation of different routing protocols in wsn using different ...
 
Enhanced abc algo for tsp
Enhanced abc algo for tspEnhanced abc algo for tsp
Enhanced abc algo for tsp
 
Database aggregation using metadata
Database aggregation using metadataDatabase aggregation using metadata
Database aggregation using metadata
 
Performance evaluation of diff routing protocols in wsn using difft network p...
Performance evaluation of diff routing protocols in wsn using difft network p...Performance evaluation of diff routing protocols in wsn using difft network p...
Performance evaluation of diff routing protocols in wsn using difft network p...
 
Lecture28 tsp
Lecture28 tspLecture28 tsp
Lecture28 tsp
 
Lecture27 linear programming
Lecture27 linear programmingLecture27 linear programming
Lecture27 linear programming
 
Lecture26
Lecture26Lecture26
Lecture26
 

Recently uploaded

1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx
JosvitaDsouza2
 
PART A. Introduction to Costumer Service
PART A. Introduction to Costumer ServicePART A. Introduction to Costumer Service
PART A. Introduction to Costumer Service
PedroFerreira53928
 
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
MysoreMuleSoftMeetup
 
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptx
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptxStudents, digital devices and success - Andreas Schleicher - 27 May 2024..pptx
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptx
EduSkills OECD
 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
siemaillard
 
Digital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and ResearchDigital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and Research
Vikramjit Singh
 
Basic phrases for greeting and assisting costumers
Basic phrases for greeting and assisting costumersBasic phrases for greeting and assisting costumers
Basic phrases for greeting and assisting costumers
PedroFerreira53928
 
How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17
Celine George
 
Cambridge International AS A Level Biology Coursebook - EBook (MaryFosbery J...
Cambridge International AS  A Level Biology Coursebook - EBook (MaryFosbery J...Cambridge International AS  A Level Biology Coursebook - EBook (MaryFosbery J...
Cambridge International AS A Level Biology Coursebook - EBook (MaryFosbery J...
AzmatAli747758
 
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
EugeneSaldivar
 
Thesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.pptThesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.ppt
EverAndrsGuerraGuerr
 
How to Create Map Views in the Odoo 17 ERP
How to Create Map Views in the Odoo 17 ERPHow to Create Map Views in the Odoo 17 ERP
How to Create Map Views in the Odoo 17 ERP
Celine George
 
Ethnobotany and Ethnopharmacology ......
Ethnobotany and Ethnopharmacology ......Ethnobotany and Ethnopharmacology ......
Ethnobotany and Ethnopharmacology ......
Ashokrao Mane college of Pharmacy Peth-Vadgaon
 
Introduction to Quality Improvement Essentials
Introduction to Quality Improvement EssentialsIntroduction to Quality Improvement Essentials
Introduction to Quality Improvement Essentials
Excellence Foundation for South Sudan
 
Home assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdfHome assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdf
Tamralipta Mahavidyalaya
 
Polish students' mobility in the Czech Republic
Polish students' mobility in the Czech RepublicPolish students' mobility in the Czech Republic
Polish students' mobility in the Czech Republic
Anna Sz.
 
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup   New Member Orientation and Q&A (May 2024).pdfWelcome to TechSoup   New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
TechSoup
 
Template Jadual Bertugas Kelas (Boleh Edit)
Template Jadual Bertugas Kelas (Boleh Edit)Template Jadual Bertugas Kelas (Boleh Edit)
Template Jadual Bertugas Kelas (Boleh Edit)
rosedainty
 
How to Break the cycle of negative Thoughts
How to Break the cycle of negative ThoughtsHow to Break the cycle of negative Thoughts
How to Break the cycle of negative Thoughts
Col Mukteshwar Prasad
 
special B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdfspecial B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdf
Special education needs
 

Recently uploaded (20)

1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx
 
PART A. Introduction to Costumer Service
PART A. Introduction to Costumer ServicePART A. Introduction to Costumer Service
PART A. Introduction to Costumer Service
 
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
 
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptx
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptxStudents, digital devices and success - Andreas Schleicher - 27 May 2024..pptx
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptx
 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
 
Digital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and ResearchDigital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and Research
 
Basic phrases for greeting and assisting costumers
Basic phrases for greeting and assisting costumersBasic phrases for greeting and assisting costumers
Basic phrases for greeting and assisting costumers
 
How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17
 
Cambridge International AS A Level Biology Coursebook - EBook (MaryFosbery J...
Cambridge International AS  A Level Biology Coursebook - EBook (MaryFosbery J...Cambridge International AS  A Level Biology Coursebook - EBook (MaryFosbery J...
Cambridge International AS A Level Biology Coursebook - EBook (MaryFosbery J...
 
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
 
Thesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.pptThesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.ppt
 
How to Create Map Views in the Odoo 17 ERP
How to Create Map Views in the Odoo 17 ERPHow to Create Map Views in the Odoo 17 ERP
How to Create Map Views in the Odoo 17 ERP
 
Ethnobotany and Ethnopharmacology ......
Ethnobotany and Ethnopharmacology ......Ethnobotany and Ethnopharmacology ......
Ethnobotany and Ethnopharmacology ......
 
Introduction to Quality Improvement Essentials
Introduction to Quality Improvement EssentialsIntroduction to Quality Improvement Essentials
Introduction to Quality Improvement Essentials
 
Home assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdfHome assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdf
 
Polish students' mobility in the Czech Republic
Polish students' mobility in the Czech RepublicPolish students' mobility in the Czech Republic
Polish students' mobility in the Czech Republic
 
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup   New Member Orientation and Q&A (May 2024).pdfWelcome to TechSoup   New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
 
Template Jadual Bertugas Kelas (Boleh Edit)
Template Jadual Bertugas Kelas (Boleh Edit)Template Jadual Bertugas Kelas (Boleh Edit)
Template Jadual Bertugas Kelas (Boleh Edit)
 
How to Break the cycle of negative Thoughts
How to Break the cycle of negative ThoughtsHow to Break the cycle of negative Thoughts
How to Break the cycle of negative Thoughts
 
special B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdfspecial B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdf
 

Parallel Algorithms

  • 1. Algorithms Parallel Algorithms By: Sandeep Kumar Poonia Asst. Professor, Jagannath University, Jaipur 1
  • 2. What is Parallelism in Computers? Parallelism is a digital computer performing more than one task at the same time Examples • IO chips : Most computers contain special circuits for IO devices which allow some task to be performed in parallel • Pipelining of Instructions : Some cpu's pipeline the execution of instructions 2
  • 3. Example……… • Multiple Arithmetic units (AU) : Some CPUs contain multiple AU so it can perform more than one arithmetic operation at the same time. • We are interested in parallelism involving more than one CPUs 3
  • 4. Common Terms for Parallelism • Concurrent Processing: A program is divided into multiple processes which are run on a single processor. The processes are time sliced on the single processor • Distributed Processing: A program is divided into multiple processes which are run on multiple distinct machines. The multiple machines are usual connected by a LAN Machines used typically are workstations running multiple programs 4
  • 5. Common Terms for Parallelism…. • Parallel Processing: A program is divided into multiple processes which are run on multiple processors. The processors normally: – are in one machine – execute one program at a time – have high speed communications between them 5
  • 6. Parallel Programming • Issues in parallel programming not found in sequential programming • Task decomposition, allocation and sequencing • Breaking down the problem into smaller tasks (processes) than can be run in parallel • Allocating the parallel tasks to different processors • Sequencing the tasks in the proper order • Efficiently use the processors 6
  • 7. Parallel Programming • Communication of interim results between processors: The goal is to reduce the cost of communication between processors. Task decomposition and allocation affect communication costs • Synchronization of processes: Some processes must wait at predetermined points for results from other processes. • Different machine architectures 7
  • 8. Performance Issues • Scalability: Using more nodes should allow a job to run faster, allow a larger job to run in the same time • Load Balancing: All nodes should have the same amount of work, Avoid having nodes idle while others are computing • Bottlenecks: Communication bottlenecks • Too many messages are traveling on the same path • Serial bottlenecks: Communication Message passing is slower than computation 8
  • 9. Parallel Machines Parameters used to describe or classify parallel computers: • Type and number of processors • Processor interconnections • Global control • Synchronous vs. asynchronous operation 9
  • 10. Type and number of processors • Massively parallel : Computer systems with thousands of processors • Ex: Parallel Supercomputers CM-5, Intel Paragon • Coarse-grained parallelism : Few (~10) processor, usually high powered in system 10
  • 11. Processor interconnections Parallel computers may be loosely divided into two groups: • Shared Memory (or Multiprocessor) • Message Passing (or Multicomputers) 11
  • 12. 12 A simple parallel algorithm Adding n numbers in parallel
  • 13. 13 A simple parallel algorithm • Example for 8 numbers: We start with 4 processors and each of them adds 2 items in the first step. • The number of items is halved at every subsequent step. Hence log n steps are required for adding n numbers. The processor requirement is O(n) . We have omitted many details from our description of the algorithm. • How do we allocate tasks to processors? • Where is the input stored? • How do the processors access the input as well as intermediate results? We do not ask these questions while designing sequential algorithms.
  • 14. 14 How do we analyze a parallel algorithm? A parallel algorithms is analyzed mainly in terms of its time, processor and work complexities. • Time complexity T(n) : How many time steps are needed? • Processor complexity P(n) : How many processors are used? • Work complexity W(n) : What is the total work done by all the processors? Hence, For our example: T(n) = O(log n) P(n) = O(n) W(n) = O(n log n)
  • 15. 15 How do we judge efficiency? • We say A1 is more efficient than A2 if W1(n) = o(W2(n)) regardless of their time complexities. For example, W1(n) = O(n) and W2(n) = O(n log n) • Consider two parallel algorithms A1and A2 for the same problem. A1: W1(n) work in T1(n) time. A2: W2(n) work in T2(n) time. If W1(n) and W2(n) are asymptotically the same then A1 is more efficient than A2 if T1(n) = o(T2(n)). For example, W1(n) = W2(n) = O(n), but T1(n) = O(log n), T2(n) = O(log2 n)
  • 16. 16 How do we judge efficiency? • It is difficult to give a more formal definition of efficiency. Consider the following situation. For A1 , W1(n) = O(n log n) and T1(n) = O(n). For A2 , W 2(n) = O(n log2 n) and T2(n) = O(log n) • It is difficult to say which one is the better algorithm. Though A1 is more efficient in terms of work, A2 runs much faster. • Both algorithms are interesting and one may be better than the other depending on a specific parallel machine.
  • 17. 17 Optimal parallel algorithms • Consider a problem, and let T(n) be the worst-case time upper bound on a serial algorithm for an input of length n. • Assume also that T(n) is the lower bound for solving the problem. Hence, we cannot have a better upper bound. • Consider a parallel algorithm for the same problem that does W(n) work in Tpar(n) time. The parallel algorithm is work optimal, if W(n) = O(T(n)) It is work-time-optimal, if Tpar(n) cannot be improved.
  • 18. 18 A simple parallel algorithm Adding n numbers in parallel
  • 19. 19 A work-optimal algorithm for adding n numbers Step 1. • Use only n/log n processors and assign log n numbers to each processor. • Each processor adds log n numbers sequentially in O(log n) time. Step 2. • We have only n/log n numbers left. We now execute our original algorithm on these n/log n numbers. • Now T(n) = O(log n) W(n) = O(n/log n x log n) = O(n)
  • 20. 20 Why is parallel computing important? • We can justify the importance of parallel computing for two reasons. Very large application domains, and Physical limitations of VLSI circuits • Though computers are getting faster and faster, user demands for solving very large problems is growing at a still faster rate. • Some examples include weather forecasting, simulation of protein folding, computational physics etc.
  • 21. 21 Physical limitations of VLSI circuits • The Pentium III processor uses 180 nano meter (nm) technology, i.e., a circuit element like a transistor can be etched within 180 x 10-9 m. • Pentium IV processor uses 160nm technology. • Intel has recently trialed processors made by using 65nm technology.
  • 22. 22 How many transistors can we pack? • Pentium III has about 42 million transistors and Pentium IV about 55 million transistors. • The number of transistors on a chip is approximately doubling every 18 months (Moore’s Law). • There are now 100 transistors for every ant on Earth
  • 23. 23 Physical limitations of VLSI circuits • All semiconductor devices are Si based. It is fairly safe to assume that a circuit element will take at least a single Si atom. • The covalent bonding in Si has a bond length approximately 20nm. • Hence, we will reach the limit of miniaturization very soon. • The upper bound on the speed of electronic signals is 3 x 108m/sec, the speed of light. • Hence, communication between two adjacent transistors will take approximately 10-18sec. • If we assume that a floating point operation involves switching of at least a few thousand transistors, such an operation will take about 10-15sec in the limit. • Hence, we are looking at 1000 teraflop machines at the peak of this technology. (TFLOPS, 1012 FLOPS) 1 flop = a floating point operation This is a very optimistic scenario.
  • 24. 24 Other Problems • The most difficult problem is to control power dissipation. • 75 watts is considered a maximum power output of a processor. • As we pack more transistors, the power output goes up and better cooling is necessary. • Intel cooled its 8 GHz demo processor using liquid Nitrogen !
  • 25. 25 The advantages of parallel computing • Parallel computing offers the possibility of overcoming such physical limits by solving problems in parallel. • In principle, thousands, even millions of processors can be used to solve a problem in parallel and today’s fastest parallel computers have already reached teraflop speeds. • Today’s microprocessors are already using several parallel processing techniques like instruction level parallelism, pipelined instruction fetching etc. • Intel uses hyper threading in Pentium IV mainly because the processor is clocked at 3 GHz, but the memory bus operates only at about 400-800 MHz.
  • 26. 26 Problems in parallel computing • The sequential or uni-processor computing model is based on von Neumann’s stored program model. • A program is written, compiled and stored in memory and it is executed by bringing one instruction at a time to the CPU.
  • 27. 27 Problems in parallel computing • Programs are written keeping this model in mind. Hence, there is a close match between the software and the hardware on which it runs. • The theoretical RAM model captures these concepts nicely. • There are many different models of parallel computing and each model is programmed in a different way. • Hence an algorithm designer has to keep in mind a specific model for designing an algorithm. • Most parallel machines are suitable for solving specific types of problems. • Designing operating systems is also a major issue.
  • 28. 28 The PRAM model n processors are connected to a shared memory.
  • 29. 29 The PRAM model • Each processor should be able to access any memory location in each clock cycle. • Hence, there may be conflicts in memory access. Also, memory management hardware needs to be very complex. • We need some kind of hardware to connect the processors to individual locations in the shared memory.
  • 30. 30 The PRAM model A more realistic PRAM model
  • 31. Models of parallel computation Parallel computational models can be broadly classified into two categories, • Single Instruction Multiple Data (SIMD) • Multiple Instruction Multiple Data (MIMD) 31
  • 32. Models of parallel computation • SIMD models are used for solving problems which have regular structures. We will mainly study SIMD models in this course. • MIMD models are more general and used for solving problems which lack regular structures. 32
  • 33. SIMD models An N- processor SIMD computer has the following characteristics : • Each processor can store both program and data in its local memory. • Each processor stores an identical copy of the same program in its local memory. 33
  • 34. SIMD models • At each clock cycle, each processor executes the same instruction from this program. However, the data are different in different processors. • The processors communicate among themselves either through an interconnection network or through a shared memory. 34
  • 35. Design issues for network SIMD models • A network SIMD model is a graph. The nodes of the graph are the processors and the edges are the links between the processors. • Since each processor solves only a small part of the overall problem, it is necessary that processors communicate with each other while solving the overall problem. 35
  • 36. Design issues for network SIMD models • The main design issues for network SIMD models are communication diameter, bisection width, and scalability. • We will discuss two most popular network models, mesh and hypercube in this lecture. 36
  • 37. Communication diameter • Communication diameter is the diameter of the graph that represents the network model. The diameter of a graph is the longest distance between a pair of nodes. • If the diameter for a model is d, the lower bound for any computation on that model is Ω(d). 37
  • 38. Communication diameter • The data can be distributed in such a way that the two furthest nodes may need to communicate. 38
  • 39. Communication diameter Communication between two furthest nodes takes Ω(d) time steps. 39
  • 40. Bisection width • The bisection width of a network model is the number of links to be removed to decompose the graph into two equal parts. • If the bisection width is large, more information can be exchanged between the two halves of the graph and hence problems can be solved faster. 40
  • 41. Dividing the graph into two parts. Bisection width 41
  • 42. Scalability • A network model must be scalable so that more processors can be easily added when new resources are available. • The model should be regular so that each processor has a small number of links incident on it. 42
  • 43. Scalability • If the number of links is large for each processor, it is difficult to add new processors as too many new links have to be added. • If we want to keep the diameter small, we need more links per processor. If we want our model to be scalable, we need less links per processor. 43
  • 44. Diameter and Scalability • The best model in terms of diameter is the complete graph. The diameter is 1. However, if we need to add a new node to an n-processor machine, we need n - 1 new links. 44
  • 45. Diameter and Scalability • The best model in terms of scalability is the linear array. We need to add only one link for a new processor. However, the diameter is n for a machine with n processors. 45
  • 46. The mesh architecture • Each internal processor of a 2-dimensional mesh is connected to 4 neighbors. • When we combine two different meshes, only the processors on the boundary need extra links. Hence it is highly scalable. 46
  • 47. • Both the diameter and bisection width of an n-processor, 2-dimensional mesh is A 4 x 4 mesh The mesh architecture ( )O n 47
  • 48. Hypercubes of 0, 1, 2 and 3 dimensions The hypercube architecture 48
  • 49. • The diameter of a d-dimensional hypercube is d as we need to flip at most d bits (traverse d links) to reach one processor from another. • The bisection width of a d-dimensional hypercube is 2d-1. The hypercube architecture 49
  • 50. • The hypercube is a highly scalable architecture. Two d-dimensional hypercubes can be easily combined to form a d+1-dimensional hypercube. • The hypercube has several variants like butterfly, shuffle-exchange network and cube-connected cycles. The hypercube architecture 50
  • 51. Adding n numbers in steps Adding n numbers on the mesh n 51
  • 52. 52
  • 53. Adding n numbers in log n steps Adding n numbers on the hypercube 53
  • 54. 54
  • 55. 55 Complexity Analysis: Given n processors connected via a hypercube, S_Sum_Hypercube needs log n rounds to compute the sum. Since n messages are sent and received in each round, the total number of messages is O(n log n). 1. Time complexity: O(log n). 2. Message complexity: O(n log n).
  • 56. Classification of the PRAM model • In the PRAM model, processors communicate by reading from and writing to the shared memory locations. 56
  • 57. Classification of the PRAM model • The power of a PRAM depends on the kind of access to the shared memory locations. 57
  • 58. Classification of the PRAM model In every clock cycle, • In the Exclusive Read Exclusive Write (EREW) PRAM, each memory location can be accessed only by one processor. • In the Concurrent Read Exclusive Write (CREW) PRAM, multiple processor can read from the same memory location, but only one processor can write. 58
  • 59. Classification of the PRAM model • In the Concurrent Read Concurrent Write (CRCW) PRAM, multiple processor can read from or write to the same memory location. 59
  • 60. Classification of the PRAM model • It is easy to allow concurrent reading. However, concurrent writing gives rise to conflicts. • If multiple processors write to the same memory location simultaneously, it is not clear what is written to the memory location. 60
  • 61. Classification of the PRAM model • In the Common CRCW PRAM, all the processors must write the same value. • In the Arbitrary CRCW PRAM, one of the processors arbitrarily succeeds in writing. • In the Priority CRCW PRAM, processors have priorities associated with them and the highest priority processor succeeds in writing. 61
  • 62. Classification of the PRAM model • The EREW PRAM is the weakest and the Priority CRCW PRAM is the strongest PRAM model. • The relative powers of the different PRAM models are as follows. 62
  • 63. Classification of the PRAM model • An algorithm designed for a weaker model can be executed within the same time and work complexities on a stronger model. 63
  • 64. Classification of the PRAM model • We say model A is less powerful compared to model B if either: • the time complexity for solving a problem is asymptotically less in model B as compared to model A. or, • if the time complexities are the same, the processor or work complexity is asymptotically less in model B as compared to model A. 64
  • 65. Classification of the PRAM model An algorithm designed for a stronger PRAM model can be simulated on a weaker model either with asymptotically more processors (work) or with asymptotically more time. 65
  • 66. Adding n numbers on a PRAM Adding n numbers on a PRAM 66
  • 67. Adding n numbers on a PRAM • This algorithm works on the EREW PRAM model as there are no read or write conflicts. • We will use this algorithm to design a matrix multiplication algorithm on the EREW PRAM. 67
  • 68. For simplicity, we assume that n = 2p for some integer p. Matrix multiplication 68
  • 69. Matrix multiplication • Each can be computed in parallel. • We allocate n processors for computing ci,j. Suppose these processors are P1, P2,…,Pn. • In the first time step, processor computes the product ai,m x bm,j. • We have now n numbers and we use the addition algorithm to sum these n numbers in log n time. , , 1 ,i jc i j n , 1mP m n 69
  • 70. Matrix multiplication • Computing each takes n processors and log n time. • Since there are n2 such ci,j s, we need overall O(n3) processors and O(log n) time. • The processor requirement can be reduced to O(n3 / log n). Exercise ! • Hence, the work complexity is O(n3) , , 1 ,i jc i j n 70
  • 71. Matrix multiplication • However, this algorithm requires concurrent read capability. • Note that, each element ai,j (and bi,j) participates in computing n elements from the C matrix. • Hence n different processors will try to read each ai,j (and bi,j) in our algorithm. 71
  • 72. For simplicity, we assume that n = 2p for some integer p. Matrix multiplication 72
  • 73. Matrix multiplication • Hence our algorithm runs on the CREW PRAM and we need to avoid the read conflicts to make it run on the EREW PRAM. • We will create n copies of each of the elements ai,j (and bi,j). Then one copy can be used for computing each ci,j . 73
  • 74. Matrix multiplication Creating n copies of a number in O (log n) time using O (n) processors on the EREW PRAM. • In the first step, one processor reads the number and creates a copy. Hence, there are two copies now. • In the second step, two processors read these two copies and create four copies. 74
  • 75. Matrix multiplication • Since the number of copies doubles in every step, n copies are created in O(log n) steps. • Though we need n processors, the processor requirement can be reduced to O (n / log n). 75
  • 76. Matrix multiplication • Since there are n2 elements in the matrix A (and in B), we need O (n3 / log n) processors and O (log n) time to create n copies of each element. • After this, there are no read conflicts in our algorithm. The overall matrix multiplication algorithm now take O (log n) time and O (n3 / log n) processors on the EREW PRAM. 76
  • 77. Matrix multiplication • The memory requirement is of course much higher for the EREW PRAM. 77
  • 78. 78 Using n3 Processors Algorithm MatMult_CREW /* Step 1 */ Forall Pi,j,k, where do in parallel C[i,j,k] = A[i,k]*B[k,j] endfor /* Step 2 */ For I =1 to log n do forall Pi,j,k, where do in parallel if (2k modulo 2l)=0 then C[i,j,2k] C[i,j,2k] + C[i,j, 2k – 2i-1] endif endfor /* The output matrix is stored in locations C[i,j,n], where */ endfor
  • 79. 79 Complexity Analysis •In the first step, the products are conducted in parallel in constant time, that is, O(1). •These products are summed in O(log n) time during the second step. Therefore, the run time is O(log n). •Since the number of processors used is n3, the cost is O(n3 log n). 1. Run time, T(n) = O(log n). 2. Number of processors, P(n) = n3. 3. Cost, C(n) = O(n3 log n).
  • 80. 80 Reducing the Number of Processors In the above algorithm, although all the processors were busy during the first step, But not all of them performed addition operations during the second step.  The second step consists of log n iterations. During the first iteration, only n3/2 processors performed addition operations, only n3/4 performed addition operations in the second iteration, and so on. With this understanding, we may be able to use a smaller machine with only n3/log n processors.
  • 81. 81 Reducing the Number of Processors 1. Each processor Pi,j,k , where computes the sum of log n products. This step will produce (n3/log n) partial sums. 2. The sum of products produced in step 1 are added to produce the resulting matrix as discussed before.
  • 82. 82 Complexity Analysis 1. Run time, T(n) = O(log n). 2. Number of processors, P(n) = n3/log n. 3. Cost, C(n) = O(n3).