The document discusses parallel algorithms and parallel computing. It begins by defining parallelism in computers as performing more than one task at the same time. Examples of parallelism include I/O chips and pipelining of instructions. Common terms for parallelism are defined, including concurrent processing, distributed processing, and parallel processing. Issues in parallel programming such as task decomposition and synchronization are outlined. Performance issues like scalability and load balancing are also discussed. Different types of parallel machines and their classification are described.
Parallel computing and its applicationsBurhan Ahmed
Parallel computing is a type of computing architecture in which several processors execute or process an application or computation simultaneously. Parallel computing helps in performing large computations by dividing the workload between more than one processor, all of which work through the computation at the same time. Most supercomputers employ parallel computing principles to operate. Parallel computing is also known as parallel processing.
↓↓↓↓ Read More:
Watch my videos on snack here: --> --> http://sck.io/x-B1f0Iy
@ Kindly Follow my Instagram Page to discuss about your mental health problems-
-----> https://instagram.com/mentality_streak?utm_medium=copy_link
@ Appreciate my work:
-----> behance.net/burhanahmed1
Thank-you !
Lecture 4 principles of parallel algorithm design updatedVajira Thambawita
The main principles of parallel algorithm design are discussed here. For more information: visit, https://sites.google.com/view/vajira-thambawita/leaning-materials
Parallel computing and its applicationsBurhan Ahmed
Parallel computing is a type of computing architecture in which several processors execute or process an application or computation simultaneously. Parallel computing helps in performing large computations by dividing the workload between more than one processor, all of which work through the computation at the same time. Most supercomputers employ parallel computing principles to operate. Parallel computing is also known as parallel processing.
↓↓↓↓ Read More:
Watch my videos on snack here: --> --> http://sck.io/x-B1f0Iy
@ Kindly Follow my Instagram Page to discuss about your mental health problems-
-----> https://instagram.com/mentality_streak?utm_medium=copy_link
@ Appreciate my work:
-----> behance.net/burhanahmed1
Thank-you !
Lecture 4 principles of parallel algorithm design updatedVajira Thambawita
The main principles of parallel algorithm design are discussed here. For more information: visit, https://sites.google.com/view/vajira-thambawita/leaning-materials
Parallel computing is computing architecture paradigm ., in which processing required to solve a problem is done in more than one processor parallel way.
This slides contains assymptotic notations, recurrence relation like subtitution method, iteration method, master method and recursion tree method and sorting algorithms like merge sort, quick sort, heap sort, counting sort, radix sort and bucket sort.
Load Balancing In Distributed ComputingRicha Singh
Load Balancing In Distributed Computing
The goal of the load balancing algorithms is to maintain the load to each processing element such that all the processing elements become neither overloaded nor idle that means each processing element ideally has equal load at any moment of time during execution to obtain the maximum performance (minimum execution time) of the system
From the perspective of Design and Analysis of Algorithm. I made these slide by collecting data from many sites.
I am Danish Javed. Student of BSCS Hons. at ITU Information Technology University Lahore, Punjab, Pakistan.
Parallel programming platforms are introduced here. For more information about parallel programming and distributed computing visit,
https://sites.google.com/view/vajira-thambawita/leaning-materials
System Interconnect Architectures,Network Properties and Routing,Linear Array,
Ring and Chordal Ring,
Barrel Shifter,
Tree and Star,
Fat Tree,
Mesh and Torus,Dynamic InterConnection Networks,Dynamic bus ,Switch Modules
,Multistage Networks,Omega Network,Baseline Network,Crossbar Networks
Parallel computing is computing architecture paradigm ., in which processing required to solve a problem is done in more than one processor parallel way.
This slides contains assymptotic notations, recurrence relation like subtitution method, iteration method, master method and recursion tree method and sorting algorithms like merge sort, quick sort, heap sort, counting sort, radix sort and bucket sort.
Load Balancing In Distributed ComputingRicha Singh
Load Balancing In Distributed Computing
The goal of the load balancing algorithms is to maintain the load to each processing element such that all the processing elements become neither overloaded nor idle that means each processing element ideally has equal load at any moment of time during execution to obtain the maximum performance (minimum execution time) of the system
From the perspective of Design and Analysis of Algorithm. I made these slide by collecting data from many sites.
I am Danish Javed. Student of BSCS Hons. at ITU Information Technology University Lahore, Punjab, Pakistan.
Parallel programming platforms are introduced here. For more information about parallel programming and distributed computing visit,
https://sites.google.com/view/vajira-thambawita/leaning-materials
System Interconnect Architectures,Network Properties and Routing,Linear Array,
Ring and Chordal Ring,
Barrel Shifter,
Tree and Star,
Fat Tree,
Mesh and Torus,Dynamic InterConnection Networks,Dynamic bus ,Switch Modules
,Multistage Networks,Omega Network,Baseline Network,Crossbar Networks
Please contact me to download this pres.A comprehensive presentation on the field of Parallel Computing.It's applications are only growing exponentially day by days.A useful seminar covering basics,its classification and implementation thoroughly.
Visit www.ameyawaghmare.wordpress.com for more info
Presentation on the Data Cube vocabulary to support Linked Data publication of statistics and measurement data sets. Given at SemTech 2011, San Francisco.
eCommerce - Practical Internet Strategies to Sell Your Products and Yourself - Part IV
Presented by Dan Bond
Sponsored jointly by Downtown Delaware, the Delaware Emerging Technology Center,
Delaware Technical Community College (Terry Campus) and USDA Rural Development
February 2010
eCommerce - Practical Internet Strategies to Sell Your Products and Yourself - Part III
Presented by Dan Bond
Sponsored jointly by Downtown Delaware, the Delaware Emerging Technology Center,
Delaware Technical Community College (Terry Campus) and USDA Rural Development
February 2010
This slide deck is used as an introduction to the MapReduce programming model, trying hard to be Hadoop-agnostic, as part of the Distributed Systems and Cloud Computing course I hold at Eurecom.
Course website:
http://michiard.github.io/DISC-CLOUD-COURSE/
Sources available here:
https://github.com/michiard/DISC-CLOUD-COURSE
This talk is given at Vizianagaram where many Engineering college faculty were attended. I have introduced developments in multi-core computers along with their architectural developments. Also, I have explained about high performance computing, where these are used. I have introduced the concept of pipelining, Amdahl's law, issues related to pipelining, MIPS architecture.
Artificial Bee Colony (ABC) is a swarm
optimization technique. This algorithm generally used to solve
nonlinear and complex problems. ABC is one of the simplest
and up to date population based probabilistic strategy for
global optimization. Analogous to other population based
algorithms, ABC also has some drawbacks computationally
pricey due to its sluggish temperament of search procedure.
The solution search equation of ABC is notably motivated by a
haphazard quantity which facilitates in exploration at the cost
of exploitation of the search space. Due to the large step size in
the solution search equation of ABC there are chances of
skipping the factual solution are higher. For that reason, this
paper introduces a new search strategy in order to balance the
diversity and convergence capability of the ABC. Both
employed bee phase and onlooker bee phase are improved
with help of a local search strategy stimulated by memetic
algorithm. This paper also proposes a new strategy for fitness
calculation and probability calculation. The proposed
algorithm is named as Improved Memetic Search in ABC
(IMeABC). It is tested over 13 impartial benchmark functions
of different complexities and two real word problems are also
considered to prove proposed algorithms superiority over
original ABC algorithm and its recent variants
Spider Monkey optimization (SMO) algorithm is newest addition in class of swarm intelligence. SMO is a population based stochastic meta-heuristic. It is motivated by intelligent foraging behaviour of fission fusion structured social creatures. SMO is a very good option for complex optimization problems. This paper proposed a modified strategy in order to enhance performance of original SMO. This paper introduces a position update strategy in SMO and modifies both local leader and global leader phase. The proposed strategy is named as Modified Position Update in Spider Monkey Optimization (MPU-SMO) algorithm. The proposed algorithm tested over benchmark problems and results show that it gives better results for considered unbiased problems.
Artificial Bee Colony (ABC) algorithm is a Nature Inspired Algorithm (NIA) which based in intelligent food foraging behaviour of honey bee swarm. ABC outperformed over other NIAs and other local search heuristics when tested for benchmark functions as well as factual world problems but occasionally it shows premature convergence and stagnation due to lack of balance between exploration and exploitation. This paper establishes a local search mechanism that enhances exploration capability of ABC and avoids the dilemma of stagnation. With help of recently introduces local search strategy it tries to balance intensification and diversification of search space. The anticipated algorithm named as Enhanced local search in ABC (EnABC) and tested over eleven benchmark functions. Results are evidence for its dominance over other competitive algorithms.
Artificial Bee Colony (ABC) optimization
algorithm is one of the recent population based probabilistic
approach developed for global optimization. ABC is simple
and has been showed significant improvement over other
Nature Inspired Algorithms (NIAs) when tested over some
standard benchmark functions and for some complex real
world optimization problems. Memetic Algorithms also
become one of the key methodologies to solve the very large
and complex real-world optimization problems. The solution
search equation of Memetic ABC is based on Golden Section
Search and an arbitrary value which tries to balance
exploration and exploitation of search space. But still there
are some chances to skip the exact solution due to its step
size. In order to balance between diversification and
intensification capability of the Memetic ABC, it is
randomized the step size in Memetic ABC. The proposed
algorithm is named as Randomized Memetic ABC (RMABC).
In RMABC, new solutions are generated nearby the best so
far solution and it helps to increase the exploitation capability
of Memetic ABC. The experiments on some test problems of
different complexities and one well known engineering
optimization application show that the proposed algorithm
outperforms over Memetic ABC (MeABC) and some other
variant of ABC algorithm(like Gbest guided ABC
(GABC),Hooke Jeeves ABC (HJABC), Best-So-Far ABC
(BSFABC) and Modified ABC (MABC) in case of almost all
the problems.
Differential Evolution (DE) is a renowned optimization stratagem that can easily solve nonlinear and comprehensive problems. DE is a well known and uncomplicated population based probabilistic approach for comprehensive optimization. It has apparently outperformed a number of Evolutionary Algorithms and further search heuristics in the vein of Particle Swarm Optimization at what time of testing over both yardstick and actual world problems. Nevertheless, DE, like other probabilistic optimization algorithms, from time to time exhibits precipitate convergence and stagnates at suboptimal position. In order to stay away from stagnation behavior while maintaining an excellent convergence speed, an innovative search strategy is introduced, named memetic search in DE. In the planned strategy, positions update equation customized as per a memetic search stratagem. In this strategy a better solution participates more times in the position modernize procedure. The position update equation is inspired from the memetic search in artificial bee colony algorithm. The proposed strategy is named as Memetic Search in Differential Evolution (MSDE). To prove efficiency and efficacy of MSDE, it is tested over 8 benchmark optimization problems and three real world optimization problems. A comparative analysis has also been carried out among proposed MSDE and original DE. Results show that the anticipated algorithm go one better than the basic DE and its recent deviations in a good number of the experiments.
Artificial Bee Colony (ABC) is a distinguished optimization strategy that can resolve nonlinear and multifaceted problems. It is comparatively a straightforward and modern population based probabilistic approach for comprehensive optimization. In the vein of the other population based algorithms, ABC is moreover computationally classy due to its slow nature of search procedure. The solution exploration equation of ABC is extensively influenced by a arbitrary quantity which helps in exploration at the cost of exploitation of the better search space. In the solution exploration equation of ABC due to the outsized step size the chance of skipping the factual solution is high. Therefore, here this paper improve onlooker bee phase with help of a local search strategy inspired by memetic algorithm to balance the diversity and convergence capability of the ABC. The proposed algorithm is named as Improved Onlooker Bee Phase in ABC (IoABC). It is tested over 12 well known un-biased test problems of diverse complexities and two engineering optimization problems; results show that the anticipated algorithm go one better than the basic ABC and its recent deviations in a good number of the experiments.
Artificial bee colony (ABC) algorithm is a well known and one of the latest swarm intelligence based techniques. This method is a population based meta-heuristic algorithm used for numerical optimization. It is based on the intelligent behavior of honey bees. Artificial Bee Colony algorithm is one of the most popular techniques that are used in optimization problems. Artificial Bee Colony algorithm has some major advantages over other heuristic methods. To utilize its good feature a number of researchers combined ABC algorithm with other methods, and generate some new hybrid methods. This paper provides comparative analysis of hybrid differential Artificial Bee Colony algorithm with hybrid ABC – SPSO, Genetic algorithm and Independent rough set approach based on some parameters like technique, dimension, methodology etc. KEYWORDS
Artificial bee colony (ABC) algorithm has proved its importance in solving a number of problems including engineering optimization problems. ABC algorithm is one of the most popular and youngest member of the family of population based nature inspired meta-heuristic swarm intelligence method. ABC has been proved its superiority over some other Nature Inspired Algorithms (NIA) when applied for both benchmark functions and real world problems. The performance of search process of ABC depends on a random value which tries to balance exploration and exploitation phase. In order to increase the performance it is required to balance the exploration of search space and exploitation of optimal solution of the ABC. This paper outlines a new hybrid of ABC algorithm with Genetic Algorithm. The proposed method integrates crossover operation from Genetic Algorithm (GA) with original ABC algorithm. The proposed method is named as Crossover based ABC (CbABC). The CbABC strengthens the exploitation phase of ABC as crossover enhances exploration of search space. The CbABC tested over four standard benchmark functions and a popular continuous optimization problem.
Multiplication of two 3 d sparse matrices using 1d arrays and linked listsDr Sandeep Kumar Poonia
A basic algorithm of 3D sparse matrix multiplication (BASMM) is presented using one dimensional (1D) arrays which is used further for multiplying two 3D sparse matrices using Linked Lists. In this algorithm, a general concept is derived in which we enter non- zeros elements in 1st and 2nd sparse matrices (3D) but store that values in 1D arrays and linked lists so that zeros could be removed or ignored to store in memory. The positions of that non-zero value are also stored in memory like row and column position. In this way space complexity is decreased. There are two ways to store the sparse matrix in memory. First is row major order and another is column major order. But, in this algorithm, row major order is used. Now multiplying those two matrices with the help of BASMM algorithm, time complexity also decreased. For the implementation of this, simple c programming and concepts of data structures are used which are very easy to understand for everyone.
Smart Huffman Compression is a software appliance designed to compress a file in a better way. By functioning as an JSP, it provides high level abstraction of java Servlet. For example, Smart Huffman Compression encodes the digital information using fewer bits, reduces the size of file without loss of data in a single, easy-to-manage software appliance form factor. It also provides us the decompression facility also. Smart Huffman Compression provides our organization with effective solutions to reduce the file size or lossless compression of data. It also expedites security of data using the encoding functionality. It is necessary to analyze the relationship between different methods and put them into a framework to better understand and better exploit the possibilities that compression provides us image compression, data compression, audio compression, video compression etc.
Artificial Bee Colony (ABC) algorithm is a Nature
Inspired Algorithm (NIA) which based on intelligent food
foraging behaviour of honey bee swarm. This paper introduces
a local search strategy that enhances exploration competence
of ABC and avoids the problem of stagnation. The proposed
strategy introduces two new local search phases in original
ABC. One just after onlooker bee phase and one after scout
bee phase. The newly introduced phases are inspired by
modified Golden Section Search (GSS) strategy. The proposed
strategy named as new local search strategy in ABC
(NLSSABC). The proposed NLSSABC algorithm applied over
thirteen standard benchmark functions in order to prove its
efficiency.
Program slicing technique is used for decomposition of a program by analyzing that particular program data
and control flow. The main application of program slicing includes various software engineering activities such as
program debugging, understanding, program maintenance, and testing and complexity measurement. When a slicing
technique gathers information about the data and control flow of the program taking an actual and specific execution
(or set of executions) of it, then it is said to be dynamic slicing, otherwise it is said to be static slicing. Generally,
dynamic slices are smaller than static because the statements of the program that affect by the slicing criterion for a
particular execution are contained by dynamic slicing. This paper reports a new approach of program slicing that is a
mixed approach of static and dynamic slice (S-D slicing) using Object Oriented Concepts in C++ Language that will
reduce the complexity of the program and simplify the program for various software engineering applications like
program debubbing.
Articial bee Colony algorithm (ABC) is a population based
heuristic search technique used for optimization problems. ABC
is a very eective optimization technique for continuous opti-
mization problem. Crossover operators have a better exploration
property so crossover operators are added to the ABC. This pa-
per presents ABC with dierent types of real coded crossover op-
erator and its application to Travelling Salesman Problem (TSP).
Each crossover operator is applied to two randomly selected par-
ents from current swarm. Two o-springs generated from crossover
and worst parent is replaced by best ospring, other parent remains
same. ABC with real coded crossover operator applied to travelling
salesman problem. The experimental result shows that our proposed
algorithm performs better than the ABC without crossover in terms
of eciency and accuracy.
Performance evaluation of diff routing protocols in wsn using difft network p...Dr Sandeep Kumar Poonia
In the recent past, wireless sensor networks have been introduced to use in many applications. To design the networks, the factors needed to be considered are the coverage area, mobility, power consumption, communication capabilities etc. The challenging goal of our project is to create a simulator to support the wireless sensor network simulation. The network simulator (NS-2) which supports both wire and wireless networks is implemented to be used with the wireless sensor network.
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptxEduSkills OECD
Andreas Schleicher presents at the OECD webinar ‘Digital devices in schools: detrimental distraction or secret to success?’ on 27 May 2024. The presentation was based on findings from PISA 2022 results and the webinar helped launch the PISA in Focus ‘Managing screen time: How to protect and equip students against distraction’ https://www.oecd-ilibrary.org/education/managing-screen-time_7c225af4-en and the OECD Education Policy Perspective ‘Students, digital devices and success’ can be found here - https://oe.cd/il/5yV
How to Make a Field invisible in Odoo 17Celine George
It is possible to hide or invisible some fields in odoo. Commonly using “invisible” attribute in the field definition to invisible the fields. This slide will show how to make a field invisible in odoo 17.
How to Create Map Views in the Odoo 17 ERPCeline George
The map views are useful for providing a geographical representation of data. They allow users to visualize and analyze the data in a more intuitive manner.
Ethnobotany and Ethnopharmacology:
Ethnobotany in herbal drug evaluation,
Impact of Ethnobotany in traditional medicine,
New development in herbals,
Bio-prospecting tools for drug discovery,
Role of Ethnopharmacology in drug evaluation,
Reverse Pharmacology.
This is a presentation by Dada Robert in a Your Skill Boost masterclass organised by the Excellence Foundation for South Sudan (EFSS) on Saturday, the 25th and Sunday, the 26th of May 2024.
He discussed the concept of quality improvement, emphasizing its applicability to various aspects of life, including personal, project, and program improvements. He defined quality as doing the right thing at the right time in the right way to achieve the best possible results and discussed the concept of the "gap" between what we know and what we do, and how this gap represents the areas we need to improve. He explained the scientific approach to quality improvement, which involves systematic performance analysis, testing and learning, and implementing change ideas. He also highlighted the importance of client focus and a team approach to quality improvement.
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdfTechSoup
In this webinar you will learn how your organization can access TechSoup's wide variety of product discount and donation programs. From hardware to software, we'll give you a tour of the tools available to help your nonprofit with productivity, collaboration, financial management, donor tracking, security, and more.
We all have good and bad thoughts from time to time and situation to situation. We are bombarded daily with spiraling thoughts(both negative and positive) creating all-consuming feel , making us difficult to manage with associated suffering. Good thoughts are like our Mob Signal (Positive thought) amidst noise(negative thought) in the atmosphere. Negative thoughts like noise outweigh positive thoughts. These thoughts often create unwanted confusion, trouble, stress and frustration in our mind as well as chaos in our physical world. Negative thoughts are also known as “distorted thinking”.
2. What is Parallelism in Computers?
Parallelism is a digital computer performing more
than one task at the same time
Examples
• IO chips : Most computers contain special
circuits for IO devices which allow some task to
be performed in parallel
• Pipelining of Instructions : Some cpu's pipeline
the execution of instructions
2
3. Example………
• Multiple Arithmetic units (AU) : Some CPUs
contain multiple AU so it can perform more
than one arithmetic operation at the same
time.
• We are interested in parallelism involving
more than one CPUs
3
4. Common Terms for Parallelism
• Concurrent Processing: A program is divided
into multiple processes which are run on a
single processor. The processes are time sliced
on the single processor
• Distributed Processing: A program is divided
into multiple processes which are run on
multiple distinct machines. The multiple
machines are usual connected by a LAN
Machines used typically are workstations
running multiple programs
4
5. Common Terms for Parallelism….
• Parallel Processing: A program is divided into
multiple processes which are run on multiple
processors. The processors normally:
– are in one machine
– execute one program at a time
– have high speed communications between them
5
6. Parallel Programming
• Issues in parallel programming not found in
sequential programming
• Task decomposition, allocation and
sequencing
• Breaking down the problem into smaller tasks
(processes) than can be run in parallel
• Allocating the parallel tasks to different
processors
• Sequencing the tasks in the proper order
• Efficiently use the processors
6
7. Parallel Programming
• Communication of interim results between
processors: The goal is to reduce the cost of
communication between processors. Task
decomposition and allocation affect
communication costs
• Synchronization of processes: Some processes
must wait at predetermined points for results
from other processes.
• Different machine architectures
7
8. Performance Issues
• Scalability: Using more nodes should allow a job to
run faster, allow a larger job to run in the same time
• Load Balancing: All nodes should have the same
amount of work, Avoid having nodes idle while
others are computing
• Bottlenecks: Communication bottlenecks
• Too many messages are traveling on the same path
• Serial bottlenecks: Communication Message passing
is slower than computation
8
9. Parallel Machines
Parameters used to describe or classify parallel
computers:
• Type and number of processors
• Processor interconnections
• Global control
• Synchronous vs. asynchronous operation
9
10. Type and number of processors
• Massively parallel : Computer systems with
thousands of processors
• Ex: Parallel Supercomputers CM-5, Intel
Paragon
• Coarse-grained parallelism : Few (~10)
processor, usually high powered in system
10
13. 13
A simple parallel algorithm
• Example for 8 numbers: We start with 4 processors and
each of them adds 2 items in the first step.
• The number of items is halved at every subsequent step.
Hence log n steps are required for adding n numbers.
The processor requirement is O(n) .
We have omitted many details from our description of the algorithm.
• How do we allocate tasks to processors?
• Where is the input stored?
• How do the processors access the input as well as intermediate
results?
We do not ask these questions while designing sequential algorithms.
14. 14
How do we analyze a parallel
algorithm?
A parallel algorithms is analyzed mainly in terms of its
time, processor and work complexities.
• Time complexity T(n) : How many time steps are needed?
• Processor complexity P(n) : How many processors are used?
• Work complexity W(n) : What is the total work done by all
the processors? Hence,
For our example: T(n) = O(log n)
P(n) = O(n)
W(n) = O(n log n)
15. 15
How do we judge efficiency?
• We say A1 is more efficient than A2 if W1(n) = o(W2(n))
regardless of their time complexities.
For example, W1(n) = O(n) and W2(n) = O(n log n)
• Consider two parallel algorithms A1and A2 for the same problem.
A1: W1(n) work in T1(n) time.
A2: W2(n) work in T2(n) time.
If W1(n) and W2(n) are asymptotically the same then A1 is more
efficient than A2 if T1(n) = o(T2(n)).
For example, W1(n) = W2(n) = O(n), but
T1(n) = O(log n), T2(n) = O(log2 n)
16. 16
How do we judge efficiency?
• It is difficult to give a more formal definition of
efficiency.
Consider the following situation.
For A1 , W1(n) = O(n log n) and T1(n) = O(n).
For A2 , W 2(n) = O(n log2 n) and T2(n) = O(log n)
• It is difficult to say which one is the better algorithm.
Though A1 is more efficient in terms of work, A2 runs
much faster.
• Both algorithms are interesting and one may be better
than the other depending on a specific parallel machine.
17. 17
Optimal parallel algorithms
• Consider a problem, and let T(n) be the worst-case time
upper bound on a serial algorithm for an input of length
n.
• Assume also that T(n) is the lower bound for solving the
problem. Hence, we cannot have a better upper bound.
• Consider a parallel algorithm for the same problem that
does W(n) work in Tpar(n) time.
The parallel algorithm is work optimal, if W(n) = O(T(n))
It is work-time-optimal, if Tpar(n) cannot be improved.
19. 19
A work-optimal algorithm for adding n
numbers
Step 1.
• Use only n/log n processors and assign log n numbers to
each processor.
• Each processor adds log n numbers sequentially in O(log n)
time.
Step 2.
• We have only n/log n numbers left. We now execute our
original algorithm on these n/log n numbers.
• Now T(n) = O(log n)
W(n) = O(n/log n x log n) = O(n)
20. 20
Why is parallel computing important?
• We can justify the importance of parallel computing for
two reasons.
Very large application domains, and
Physical limitations of VLSI circuits
• Though computers are getting faster and faster, user
demands for solving very large problems is growing at a
still faster rate.
• Some examples include weather forecasting, simulation
of protein folding, computational physics etc.
21. 21
Physical limitations of VLSI circuits
• The Pentium III processor uses 180 nano meter (nm) technology, i.e.,
a circuit element like a transistor can be etched within
180 x 10-9 m.
• Pentium IV processor uses 160nm technology.
• Intel has recently trialed processors made by using 65nm
technology.
22. 22
How many transistors can we pack?
• Pentium III has about 42 million transistors and
Pentium IV about 55 million transistors.
• The number of transistors on a chip is approximately
doubling every 18 months (Moore’s Law).
• There are now 100 transistors for every ant on Earth
23. 23
Physical limitations of VLSI circuits
• All semiconductor devices are Si based. It is fairly safe to assume
that a circuit element will take at least a single Si atom.
• The covalent bonding in Si has a bond length approximately 20nm.
• Hence, we will reach the limit of miniaturization very soon.
• The upper bound on the speed of electronic signals is 3 x 108m/sec,
the speed of light.
• Hence, communication between two adjacent transistors will take
approximately 10-18sec.
• If we assume that a floating point operation involves switching of at
least a few thousand transistors, such an operation will take about
10-15sec in the limit.
• Hence, we are looking at 1000 teraflop machines at the peak of this
technology. (TFLOPS, 1012 FLOPS)
1 flop = a floating point operation
This is a very optimistic scenario.
24. 24
Other Problems
• The most difficult problem is to control power dissipation.
• 75 watts is considered a maximum power output of a
processor.
• As we pack more transistors, the power output goes up and
better cooling is necessary.
• Intel cooled its 8 GHz demo processor using liquid Nitrogen !
25. 25
The advantages of parallel computing
• Parallel computing offers the possibility of overcoming such
physical limits by solving problems in parallel.
• In principle, thousands, even millions of processors can be
used to solve a problem in parallel and today’s fastest
parallel computers have already reached teraflop speeds.
• Today’s microprocessors are already using several parallel
processing techniques like instruction level parallelism,
pipelined instruction fetching etc.
• Intel uses hyper threading in Pentium IV mainly because the
processor is clocked at 3 GHz, but the memory bus operates
only at about 400-800 MHz.
26. 26
Problems in parallel computing
• The sequential or uni-processor computing
model is based on von Neumann’s stored
program model.
• A program is written, compiled and stored in
memory and it is executed by bringing one
instruction at a time to the CPU.
27. 27
Problems in parallel computing
• Programs are written keeping this model in mind.
Hence, there is a close match between the software
and the hardware on which it runs.
• The theoretical RAM model captures these concepts
nicely.
• There are many different models of parallel computing
and each model is programmed in a different way.
• Hence an algorithm designer has to keep in mind a
specific model for designing an algorithm.
• Most parallel machines are suitable for solving specific
types of problems.
• Designing operating systems is also a major issue.
29. 29
The PRAM model
• Each processor should be able to access any
memory location in each clock cycle.
• Hence, there may be conflicts in memory
access. Also, memory management hardware
needs to be very complex.
• We need some kind of hardware to connect
the processors to individual locations in the
shared memory.
31. Models of parallel computation
Parallel computational models can be
broadly classified into two categories,
• Single Instruction Multiple Data (SIMD)
• Multiple Instruction Multiple Data (MIMD)
31
32. Models of parallel computation
• SIMD models are used for solving
problems which have regular structures.
We will mainly study SIMD models in this
course.
• MIMD models are more general and used
for solving problems which lack regular
structures.
32
33. SIMD models
An N- processor SIMD computer has the
following characteristics :
• Each processor can store both program
and data in its local memory.
• Each processor stores an identical copy
of the same program in its local memory.
33
34. SIMD models
• At each clock cycle, each processor
executes the same instruction from this
program. However, the data are different
in different processors.
• The processors communicate among
themselves either through an
interconnection network or through a
shared memory.
34
35. Design issues for network
SIMD models
• A network SIMD model is a graph. The
nodes of the graph are the processors
and the edges are the links between the
processors.
• Since each processor solves only a small
part of the overall problem, it is necessary
that processors communicate with each
other while solving the overall problem.
35
36. Design issues for network
SIMD models
• The main design issues for network SIMD
models are communication diameter,
bisection width, and scalability.
• We will discuss two most popular network
models, mesh and hypercube in this
lecture.
36
37. Communication diameter
• Communication diameter is the diameter
of the graph that represents the network
model. The diameter of a graph is the
longest distance between a pair of nodes.
• If the diameter for a model is d, the lower
bound for any computation on that model
is Ω(d).
37
38. Communication diameter
• The data can be distributed in such a way
that the two furthest nodes may need to
communicate.
38
40. Bisection width
• The bisection width of a network model is
the number of links to be removed to
decompose the graph into two equal
parts.
• If the bisection width is large, more
information can be exchanged between
the two halves of the graph and hence
problems can be solved faster.
40
42. Scalability
• A network model must be scalable so that
more processors can be easily added
when new resources are available.
• The model should be regular so that each
processor has a small number of links
incident on it.
42
43. Scalability
• If the number of links is large for each
processor, it is difficult to add new
processors as too many new links have to
be added.
• If we want to keep the diameter small, we
need more links per processor. If we want
our model to be scalable, we need less
links per processor.
43
44. Diameter and Scalability
• The best model in terms of diameter is the
complete graph. The diameter is 1.
However, if we need to add a new node to
an n-processor machine, we need n - 1
new links.
44
45. Diameter and Scalability
• The best model in terms of scalability is
the linear array. We need to add only one
link for a new processor. However, the
diameter is n for a machine with n
processors.
45
46. The mesh architecture
• Each internal processor of a 2-dimensional
mesh is connected to 4 neighbors.
• When we combine two different meshes,
only the processors on the boundary need
extra links. Hence it is highly scalable.
46
47. • Both the diameter and bisection width of an
n-processor, 2-dimensional mesh is
A 4 x 4 mesh
The mesh architecture
( )O n
47
48. Hypercubes of 0, 1, 2 and 3 dimensions
The hypercube architecture
48
49. • The diameter of a d-dimensional
hypercube is d as we need to flip at most d
bits (traverse d links) to reach one
processor from another.
• The bisection width of a d-dimensional
hypercube is 2d-1.
The hypercube architecture
49
50. • The hypercube is a highly scalable
architecture. Two d-dimensional
hypercubes can be easily combined to
form a d+1-dimensional hypercube.
• The hypercube has several variants like
butterfly, shuffle-exchange network and
cube-connected cycles.
The hypercube architecture
50
55. 55
Complexity Analysis: Given n processors
connected via a hypercube, S_Sum_Hypercube needs
log n rounds to compute the sum. Since n messages
are sent and received in each round, the total number of
messages is O(n log n).
1. Time complexity: O(log n).
2. Message complexity: O(n log n).
56. Classification of the PRAM model
• In the PRAM model, processors
communicate by reading from and writing
to the shared memory locations.
56
57. Classification of the PRAM
model
• The power of a PRAM depends on the
kind of access to the shared memory
locations.
57
58. Classification of the PRAM
model
In every clock cycle,
• In the Exclusive Read Exclusive Write
(EREW) PRAM, each memory location
can be accessed only by one processor.
• In the Concurrent Read Exclusive Write
(CREW) PRAM, multiple processor can
read from the same memory location, but
only one processor can write.
58
59. Classification of the PRAM
model
• In the Concurrent Read Concurrent Write
(CRCW) PRAM, multiple processor can
read from or write to the same memory
location.
59
60. Classification of the PRAM
model
• It is easy to allow concurrent reading.
However, concurrent writing gives rise to
conflicts.
• If multiple processors write to the same
memory location simultaneously, it is not
clear what is written to the memory
location.
60
61. Classification of the PRAM
model
• In the Common CRCW PRAM, all the
processors must write the same value.
• In the Arbitrary CRCW PRAM, one of the
processors arbitrarily succeeds in writing.
• In the Priority CRCW PRAM, processors
have priorities associated with them and
the highest priority processor succeeds in
writing.
61
62. Classification of the PRAM
model
• The EREW PRAM is the weakest and the
Priority CRCW PRAM is the strongest
PRAM model.
• The relative powers of the different PRAM
models are as follows.
62
63. Classification of the PRAM
model
• An algorithm designed for a weaker
model can be executed within the same
time and work complexities on a
stronger model.
63
64. Classification of the PRAM
model
• We say model A is less powerful
compared to model B if either:
• the time complexity for solving a
problem is asymptotically less in model
B as compared to model A. or,
• if the time complexities are the same,
the processor or work complexity is
asymptotically less in model B as
compared to model A. 64
65. Classification of the PRAM
model
An algorithm designed for a stronger PRAM
model can be simulated on a weaker model
either with asymptotically more processors
(work) or with asymptotically more time.
65
67. Adding n numbers on a PRAM
• This algorithm works on the EREW PRAM
model as there are no read or write
conflicts.
• We will use this algorithm to design a
matrix multiplication algorithm on the
EREW PRAM.
67
68. For simplicity, we assume that n = 2p for some integer p.
Matrix multiplication
68
69. Matrix multiplication
• Each can be computed in
parallel.
• We allocate n processors for computing ci,j.
Suppose these processors are P1, P2,…,Pn.
• In the first time step, processor
computes the product ai,m x bm,j.
• We have now n numbers and we use the
addition algorithm to sum these n numbers
in log n time.
, , 1 ,i jc i j n
, 1mP m n
69
70. Matrix multiplication
• Computing each takes n
processors and log n time.
• Since there are n2 such ci,j s, we need
overall O(n3) processors and O(log n)
time.
• The processor requirement can be
reduced to O(n3 / log n). Exercise !
• Hence, the work complexity is O(n3)
, , 1 ,i jc i j n
70
71. Matrix multiplication
• However, this algorithm requires
concurrent read capability.
• Note that, each element ai,j (and bi,j)
participates in computing n elements from
the C matrix.
• Hence n different processors will try to
read each ai,j (and bi,j) in our algorithm.
71
72. For simplicity, we assume that n = 2p for some integer p.
Matrix multiplication
72
73. Matrix multiplication
• Hence our algorithm runs on the CREW
PRAM and we need to avoid the read
conflicts to make it run on the EREW
PRAM.
• We will create n copies of each of the
elements ai,j (and bi,j). Then one copy can
be used for computing each ci,j .
73
74. Matrix multiplication
Creating n copies of a number in O (log n)
time using O (n) processors on the EREW
PRAM.
• In the first step, one processor reads the
number and creates a copy. Hence, there
are two copies now.
• In the second step, two processors read
these two copies and create four copies.
74
75. Matrix multiplication
• Since the number of copies doubles in
every step, n copies are created in O(log
n) steps.
• Though we need n processors, the
processor requirement can be reduced to
O (n / log n).
75
76. Matrix multiplication
• Since there are n2 elements in the matrix A
(and in B), we need O (n3 / log n)
processors and O (log n) time to create n
copies of each element.
• After this, there are no read conflicts in our
algorithm. The overall matrix multiplication
algorithm now take O (log n) time and
O (n3 / log n) processors on the EREW
PRAM.
76
78. 78
Using n3 Processors
Algorithm MatMult_CREW
/* Step 1 */
Forall Pi,j,k, where do in parallel
C[i,j,k] = A[i,k]*B[k,j]
endfor
/* Step 2 */
For I =1 to log n do
forall Pi,j,k, where do in parallel
if (2k modulo 2l)=0 then
C[i,j,2k] C[i,j,2k] + C[i,j, 2k – 2i-1]
endif
endfor
/* The output matrix is stored in locations C[i,j,n], where */
endfor
79. 79
Complexity Analysis
•In the first step, the products are conducted in parallel
in constant time, that is, O(1).
•These products are summed in O(log n) time during
the second step. Therefore, the run time is O(log n).
•Since the number of processors used is n3, the cost is
O(n3 log n).
1. Run time, T(n) = O(log n).
2. Number of processors, P(n) = n3.
3. Cost, C(n) = O(n3 log n).
80. 80
Reducing the Number of Processors
In the above algorithm, although
all the processors were busy during the first step,
But not all of them performed addition operations during the
second step.
The second step consists of log n iterations.
During the first iteration, only n3/2 processors performed
addition operations,
only n3/4 performed addition operations in the second
iteration, and so on.
With this understanding, we may be able to use a smaller
machine with only n3/log n processors.
81. 81
Reducing the Number of Processors
1. Each processor Pi,j,k , where
computes the sum of log n products. This
step will produce (n3/log n) partial sums.
2. The sum of products produced in step 1 are
added to produce the resulting matrix as
discussed before.
82. 82
Complexity Analysis
1. Run time, T(n) = O(log n).
2. Number of processors, P(n) = n3/log n.
3. Cost, C(n) = O(n3).