SlideShare a Scribd company logo
1 of 24
SDSU CS 505 Lectures Slides
Spring 2002
Instructors:
Amit Majumdar
Faramarz Valafar
2
Introduction
(Review of serial computing as needed)
• Parallel Computing – Real Life Scenario
• Cost Effectiveness of Parallel Processors
• What is parallel computing?
• Why do parallel computing?
• Types of parallel computing
• What are some limits of parallel computing?
3
Parallel Computing – Real Life Scenario
• Stacking or reshelving of a set of library books.
Assume books are organized into shelves and
shelves are grouped into bays.
Single worker can only do it in a certain rate.
We can speed it up by employing multiple workers.
What is the best strategy ?
1. Simple way is to divide the total books equally
among workers. Each worker stacks the books one
at a time. Worker must walk all over the library.
2. Alternate way is to assign fixed disjoint sets of bay
to each worker. Each worker is assigned equal # of
books arbitrarily. Workers stack books in their bays
or pass to another worker responsible for the bay it
belongs to.
4
Parallel Computing – Real Life Scenario
• Parallel processing allows to accomplish a task faster by
dividing the work into a set of substacks assigned to
multiple workers.
• Assigning a set of books to workers is task partitioning.
Passing of books to each other is an example of
communication between subtasks.
• For some problems assigning work to multiple workers
might be more time consuming than doing it locally.
• Some problems may be completely serial; e.g. digging a
post hole. Poorly suited to parallel processing.
• All problems are not equally amenable to parallel
processing.
5
Wheather Modelling and Forcasting
Consider 3000 X 3000 miles, and height of 11 miles.
For modeling partition into segments of 0.1X0.1X0.1 cubic miles =
~1011
segments.
Lets take 2-day period and parameters need to be computed every
30 min. Assume the computations take 100 instrs. A single update
takes 1013
instrs. For two days we have total instrs. of 1015
. For
serial computer with 1010
instrs./sec, this takes 280 hrs to predict
next 48 hrs !!
Lets take 1000 processors capable of 108
instrs/sec. Each processor
will do 108
segments. For 2 days we have 1012
instrs. Calculation
done in 3 hrs !!
Currently all major weather forcast centers (US, Europe, Asia) have
supercomputers with 1000s of processors.
6
Other Examples
• Vehicle design and dynamics
• Analysis of protein structures
• Human genome work
• Quantum chromodynamics, Astrophysics
• Ocean modeling
• Imaging and Rendering
• Petroleum exploration
• Nuclear Weapon design
• Database query
• Ozone layer monitoring
• Natural language understanding
• Study of chemical phenomena
• And many other grand challenge projects
7
Cost Effectiveness of Parallel Processors
• Currently the speed of off-the shelf micro processors is within one
order of magnitude of the fastest serial computers. Micro processors
cost many orders of magnitude less.
• Connect only a few micro processors together to form a parallel
computer with speed comparable to fastest serial computers. Cost of
such parallel computer is less.
• Connect a large number of processors into parallel computer
overcomes the saturation rate of serial computers.
• Parallel computers can provide much higher raw computation power
than fastest serial computes.
• Need to have actual applications that can take advantage of this high
power.
8
What is Parallel Computing?
• Parallel computing: use of multiple computers or
processors working together on a common task.
– Each processor works on its section of the problem
– Processors can exchange information
Grid of Problem to be solved
CPU #1 works on this area
of the problem
CPU #3 works on this area
of the problem
CPU #4 works on this area
of the problem
CPU #2 works on this area
of the problem
y
x
exchange
exchange
9
Why Do Parallel Computing?
• Limits of single CPU computing
– Available memory
– Performance
• Parallel computing allows:
– Solve problems that don’t fit on a single CPU’s
memory space
– Solve problems that can’t be solved in a
reasonable time
• We can run…
– Larger problems
– Faster
– More cases
Types of Parallelism : Two Extremes
• Data parallel
– Each processor performs the same task on
different data
– Example - grid problems
• Task parallel
– Each processor performs a different task
– Example - signal processing
• Most applications fall somewhere on the
continuum between these two extremes
Typical Data Parallel Program
• Example: integrate 2-D propagation problem:
2
2
2
2
y
B
x
D
t ∂
Ψ∂
⋅+
∂
Ψ∂
⋅=
∂
Ψ∂
2
11
2
11
1
22
y
fff
B
x
fff
D
t
ff n
i
n
i
n
i
n
i
n
i
n
i
n
i
n
i
∆
−−
⋅+
∆
−−
⋅=
∆
− −+−+
+
Starting partial
differential equation:
Finite Difference
Approximation:
PE #0 PE #1 PE #2
PE #4 PE #5 PE #6
PE #3
PE #7
y
x
Basics of Data Parallel Programming
• One code will run on 2 CPUs
• Program has array of data to be operated on by 2 CPU so
array is split into two parts.
program:
…
if CPU=a then
low_limit=1
upper_limit=50
elseif CPU=b then
low_limit=51
upper_limit=100
end if
do I = low_limit,
upper_limit
work on A(I)
end do
...
end program
CPU A CPU B
program:
…
low_limit=1
upper_limit=50
do I= low_limit,
upper_limit
work on A(I)
end do
…
end program
program:
…
low_limit=51
upper_limit=100
do I= low_limit,
upper_limit
work on A(I)
end do
…
end program
Typical Task Parallel Application
• Example: Signal Processing
• Use one processor for each task
• Can use more processors if one is overloaded
DATA Normalize
Task
FFT
Task
Multiply
Task
Inverse
FFT
Task
Basics of Task Parallel Programming
• One code will run on 2 CPUs
• Program has 2 tasks (a and b) to be done by 2
CPUs
program.f:
…
initialize
...
if CPU=a then
do task a
elseif CPU=b then
do task b
end if
….
end program
CPU A CPU B
program.f:
…
initialize
…
do task a
…
end program
program.f:
…
initialize
…
do task b
…
end program
Limits of Parallel Computing
• Theoretical Upper Limits
– Amdahl’s Law
• Practical Limits
– Load balancing
– Non-computational sections
• Other Considerations
– time to re-write code
Theoretical Upper Limits to Performance
• All parallel programs contain:
– Serial sections
– Parallel sections
• Serial sections limit the parallel effectiveness
• Speedup is the ratio of the time required to run a
code on one processor to the time required to run
the same code on multiple (N) processors
• Amdahl’s Law states this formally
tn
= fp / N + fs( )t1
S = 1
fs
+ fp / N
Amdahl’s Law
• Amdahl’s Law places a strict limit on the speedup that can
be realized by using multiple processors.
– Effect of multiple processors on run time
– Effect of multiple processors on speed up
– Where
• fs = serial fraction of code
• fp = parallel fraction of code
• N = number of processors
• tn= time to run on N processors
It takes only a small fraction of serial content in a code to
degrade the parallel performance.
0
50
100
150
200
250
0 50 100 150 200 250
Number of processors
fp = 1.000
fp = 0.999
fp = 0.990
fp = 0.900
Illustration of Amdahl's Law
Speedup
Amdahl’s Law provides a theoretical upper limit on parallel
speedup assuming that there are no costs for speedup assuming
that there are no costs for communications. In reality,
communications will result in a further degradation of performance.
0
10
20
30
40
50
60
70
80
0 50 100 150 200 250
Number of processors
Amdahl's Law
Reality
fp = 0.99
Practical Limits: Amdahl’s Law vs. Reality
Speedup
Practical Limits: Amdahl’s Law vs. Reality
• In reality, Amdahl’s Law is limited by many
things:
– Communications
– I/O
– Load balancing (waiting)
– Scheduling (shared processors or memory)
Other Considerations
• Writing effective parallel application is difficult!
– Load balance is important
– Communication can limit parallel efficiency
– Serial time can dominate
• Is it worth your time to rewrite your application?
– Do the CPU requirements justify parallelization?
– Will the code be used just once?
Issues in Parallel Computing
• Design of parallel computers : Design so that it
scales to a large # of processor and are capable of
supporting fast comunication and data sharing
among processors.
• Design of Efficient Algorithms : Designing parallel
algorithms are different from designing serial
algorithms. Significant amount of work is being done
for numerical and non-numerical parallel algorithms
• Methods of Evaluating Parallel Algorithms : Given
a parallel computer and a parallel algorithm we need
to evaluate the performance of the resulting system.
How fast problem is solved and how efficiently the
processors are used.
Issues in Parallel Computing
• Parallel Computer Languages : Parallel algorithms are
implemented using a programming language. This language
must be flexible enough to allow efficient implementation
and must be easy to program in. Must efficiently use the
hardware.
• Parallel Proramming Tools : Tools (compilers, libraries,
debuggers, other monitoring or performance evaluation
tools) must shield users from low level machine
characteristics.
• Portable Parallel Programs: This is one of the main
problems with current parallel computers. Program written
for one parallel computer require extensive work to port to
another parallel computer.
Issues in Parallel Computing
• Automatic Programming of Parallel Computers :
This is about design of parallelizing compilers which
extract implicit parallelism from programs that have
not been explicitly parallelized. But this approach has
limited potential for exploiting the power of large
parallel machines.

More Related Content

What's hot

Sara Afshar: Scheduling and Resource Sharing in Multiprocessor Real-Time Systems
Sara Afshar: Scheduling and Resource Sharing in Multiprocessor Real-Time SystemsSara Afshar: Scheduling and Resource Sharing in Multiprocessor Real-Time Systems
Sara Afshar: Scheduling and Resource Sharing in Multiprocessor Real-Time Systems
knowdiff
 
Parallel computing
Parallel computingParallel computing
Parallel computing
virend111
 
Parallel architecture-programming
Parallel architecture-programmingParallel architecture-programming
Parallel architecture-programming
Shaveta Banda
 

What's hot (20)

Introduction to Parallel Computing
Introduction to Parallel ComputingIntroduction to Parallel Computing
Introduction to Parallel Computing
 
Chap1 slides
Chap1 slidesChap1 slides
Chap1 slides
 
Chap2 slides
Chap2 slidesChap2 slides
Chap2 slides
 
Sara Afshar: Scheduling and Resource Sharing in Multiprocessor Real-Time Systems
Sara Afshar: Scheduling and Resource Sharing in Multiprocessor Real-Time SystemsSara Afshar: Scheduling and Resource Sharing in Multiprocessor Real-Time Systems
Sara Afshar: Scheduling and Resource Sharing in Multiprocessor Real-Time Systems
 
Parallel computing
Parallel computingParallel computing
Parallel computing
 
Introduction to parallel_computing
Introduction to parallel_computingIntroduction to parallel_computing
Introduction to parallel_computing
 
High Performance Computer Architecture
High Performance Computer ArchitectureHigh Performance Computer Architecture
High Performance Computer Architecture
 
network ram parallel computing
network ram parallel computingnetwork ram parallel computing
network ram parallel computing
 
Parallel architecture-programming
Parallel architecture-programmingParallel architecture-programming
Parallel architecture-programming
 
Chapter 1 - introduction - parallel computing
Chapter  1 - introduction - parallel computingChapter  1 - introduction - parallel computing
Chapter 1 - introduction - parallel computing
 
Multicore and GPU Programming
Multicore and GPU ProgrammingMulticore and GPU Programming
Multicore and GPU Programming
 
Parallel computing in india
Parallel computing in indiaParallel computing in india
Parallel computing in india
 
Unit 1 Computer organization and Instructions
Unit 1 Computer organization and InstructionsUnit 1 Computer organization and Instructions
Unit 1 Computer organization and Instructions
 
Unit 5 Advanced Computer Architecture
Unit 5 Advanced Computer ArchitectureUnit 5 Advanced Computer Architecture
Unit 5 Advanced Computer Architecture
 
Parallel processing
Parallel processingParallel processing
Parallel processing
 
Parallel Algorithms
Parallel AlgorithmsParallel Algorithms
Parallel Algorithms
 
Introduction to parallel processing
Introduction to parallel processingIntroduction to parallel processing
Introduction to parallel processing
 
Lecture02 types
Lecture02 typesLecture02 types
Lecture02 types
 
Multi processor scheduling
Multi  processor schedulingMulti  processor scheduling
Multi processor scheduling
 
program partitioning and scheduling IN Advanced Computer Architecture
program partitioning and scheduling  IN Advanced Computer Architectureprogram partitioning and scheduling  IN Advanced Computer Architecture
program partitioning and scheduling IN Advanced Computer Architecture
 

Viewers also liked

Towards a Multilingual Ontology for Ontology-driven Content Mining in Social ...
Towards a Multilingual Ontology for Ontology-driven Content Mining in Social ...Towards a Multilingual Ontology for Ontology-driven Content Mining in Social ...
Towards a Multilingual Ontology for Ontology-driven Content Mining in Social ...
Marcirio Chaves
 
A Identificação de Riscos Novos e Potencializados em Projetos de Tecnologia d...
A Identificação de Riscos Novos e Potencializados em Projetos de Tecnologia d...A Identificação de Riscos Novos e Potencializados em Projetos de Tecnologia d...
A Identificação de Riscos Novos e Potencializados em Projetos de Tecnologia d...
Marcirio Chaves
 
Database management system chapter15
Database management system chapter15Database management system chapter15
Database management system chapter15
Md. Mahedi Mahfuj
 
Database management system chapter12
Database management system chapter12Database management system chapter12
Database management system chapter12
Md. Mahedi Mahfuj
 
Tutorial on Parallel Computing and Message Passing Model - C5
Tutorial on Parallel Computing and Message Passing Model - C5Tutorial on Parallel Computing and Message Passing Model - C5
Tutorial on Parallel Computing and Message Passing Model - C5
Marcirio Chaves
 
Tutorial on Parallel Computing and Message Passing Model - C4
Tutorial on Parallel Computing and Message Passing Model - C4Tutorial on Parallel Computing and Message Passing Model - C4
Tutorial on Parallel Computing and Message Passing Model - C4
Marcirio Chaves
 
Tutorial on Parallel Computing and Message Passing Model - C2
Tutorial on Parallel Computing and Message Passing Model - C2Tutorial on Parallel Computing and Message Passing Model - C2
Tutorial on Parallel Computing and Message Passing Model - C2
Marcirio Chaves
 

Viewers also liked (20)

Towards a Multilingual Ontology for Ontology-driven Content Mining in Social ...
Towards a Multilingual Ontology for Ontology-driven Content Mining in Social ...Towards a Multilingual Ontology for Ontology-driven Content Mining in Social ...
Towards a Multilingual Ontology for Ontology-driven Content Mining in Social ...
 
A Identificação de Riscos Novos e Potencializados em Projetos de Tecnologia d...
A Identificação de Riscos Novos e Potencializados em Projetos de Tecnologia d...A Identificação de Riscos Novos e Potencializados em Projetos de Tecnologia d...
A Identificação de Riscos Novos e Potencializados em Projetos de Tecnologia d...
 
Clustering manual
Clustering manualClustering manual
Clustering manual
 
A Multidomain and Multilingual Conceptual Data Model for Online Reviews Repre...
A Multidomain and Multilingual Conceptual Data Model for Online Reviews Repre...A Multidomain and Multilingual Conceptual Data Model for Online Reviews Repre...
A Multidomain and Multilingual Conceptual Data Model for Online Reviews Repre...
 
Lecture2
Lecture2Lecture2
Lecture2
 
Matrix multiplication graph
Matrix multiplication graphMatrix multiplication graph
Matrix multiplication graph
 
Revisita e Análise dos Métodos para Captura de Lições Aprendidas: Uma Contrib...
Revisita e Análise dos Métodos para Captura de Lições Aprendidas: Uma Contrib...Revisita e Análise dos Métodos para Captura de Lições Aprendidas: Uma Contrib...
Revisita e Análise dos Métodos para Captura de Lições Aprendidas: Uma Contrib...
 
Phd Marcirio Chaves
Phd Marcirio ChavesPhd Marcirio Chaves
Phd Marcirio Chaves
 
A Fine-Grained Analysis of User-Generated Content to Support Decision Making
A Fine-Grained Analysis of User-Generated Content to Support Decision MakingA Fine-Grained Analysis of User-Generated Content to Support Decision Making
A Fine-Grained Analysis of User-Generated Content to Support Decision Making
 
Parallel computing(1)
Parallel computing(1)Parallel computing(1)
Parallel computing(1)
 
Observer pattern
Observer patternObserver pattern
Observer pattern
 
Job search_resume
Job search_resumeJob search_resume
Job search_resume
 
Database management system chapter15
Database management system chapter15Database management system chapter15
Database management system chapter15
 
Database management system chapter12
Database management system chapter12Database management system chapter12
Database management system chapter12
 
Job search_interview
Job search_interviewJob search_interview
Job search_interview
 
Tutorial on Parallel Computing and Message Passing Model - C5
Tutorial on Parallel Computing and Message Passing Model - C5Tutorial on Parallel Computing and Message Passing Model - C5
Tutorial on Parallel Computing and Message Passing Model - C5
 
Tutorial on Parallel Computing and Message Passing Model - C4
Tutorial on Parallel Computing and Message Passing Model - C4Tutorial on Parallel Computing and Message Passing Model - C4
Tutorial on Parallel Computing and Message Passing Model - C4
 
Collective Communications in MPI
 Collective Communications in MPI Collective Communications in MPI
Collective Communications in MPI
 
Chapter 1 pc
Chapter 1 pcChapter 1 pc
Chapter 1 pc
 
Tutorial on Parallel Computing and Message Passing Model - C2
Tutorial on Parallel Computing and Message Passing Model - C2Tutorial on Parallel Computing and Message Passing Model - C2
Tutorial on Parallel Computing and Message Passing Model - C2
 

Similar to Lecture1

Week # 1.pdf
Week # 1.pdfWeek # 1.pdf
Week # 1.pdf
giddy5
 
SecondPresentationDesigning_Parallel_Programs.ppt
SecondPresentationDesigning_Parallel_Programs.pptSecondPresentationDesigning_Parallel_Programs.ppt
SecondPresentationDesigning_Parallel_Programs.ppt
RubenGabrielHernande
 

Similar to Lecture1 (20)

cs1311lecture25wdl.ppt
cs1311lecture25wdl.pptcs1311lecture25wdl.ppt
cs1311lecture25wdl.ppt
 
Parallel Computing-Part-1.pptx
Parallel Computing-Part-1.pptxParallel Computing-Part-1.pptx
Parallel Computing-Part-1.pptx
 
CSA unit5.pptx
CSA unit5.pptxCSA unit5.pptx
CSA unit5.pptx
 
Week # 1.pdf
Week # 1.pdfWeek # 1.pdf
Week # 1.pdf
 
04 performance
04 performance04 performance
04 performance
 
unit 4.pptx
unit 4.pptxunit 4.pptx
unit 4.pptx
 
unit 4.pptx
unit 4.pptxunit 4.pptx
unit 4.pptx
 
Parallel Computing - Lec 5
Parallel Computing - Lec 5Parallel Computing - Lec 5
Parallel Computing - Lec 5
 
parallel Questions & answers
parallel Questions & answersparallel Questions & answers
parallel Questions & answers
 
SecondPresentationDesigning_Parallel_Programs.ppt
SecondPresentationDesigning_Parallel_Programs.pptSecondPresentationDesigning_Parallel_Programs.ppt
SecondPresentationDesigning_Parallel_Programs.ppt
 
Report on High Performance Computing
Report on High Performance ComputingReport on High Performance Computing
Report on High Performance Computing
 
Multicore_Architecture Book.pdf
Multicore_Architecture Book.pdfMulticore_Architecture Book.pdf
Multicore_Architecture Book.pdf
 
High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...
High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...
High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...
 
Fundamentals.pptx
Fundamentals.pptxFundamentals.pptx
Fundamentals.pptx
 
High performance computing
High performance computingHigh performance computing
High performance computing
 
Coding For Cores - C# Way
Coding For Cores - C# WayCoding For Cores - C# Way
Coding For Cores - C# Way
 
Parallel & Distributed processing
Parallel & Distributed processingParallel & Distributed processing
Parallel & Distributed processing
 
intro, definitions, basic laws+.pptx
intro, definitions, basic laws+.pptxintro, definitions, basic laws+.pptx
intro, definitions, basic laws+.pptx
 
Parallel Computing - Lec 6
Parallel Computing - Lec 6Parallel Computing - Lec 6
Parallel Computing - Lec 6
 
PMSCS 657_Parallel and Distributed processing
PMSCS 657_Parallel and Distributed processingPMSCS 657_Parallel and Distributed processing
PMSCS 657_Parallel and Distributed processing
 

More from tt_aljobory (20)

Homework 2 sol
Homework 2 solHomework 2 sol
Homework 2 sol
 
Lecture12
Lecture12Lecture12
Lecture12
 
Lecture11
Lecture11Lecture11
Lecture11
 
Lecture10
Lecture10Lecture10
Lecture10
 
Lecture9
Lecture9Lecture9
Lecture9
 
Lecture7
Lecture7Lecture7
Lecture7
 
Lecture8
Lecture8Lecture8
Lecture8
 
Lecture6
Lecture6Lecture6
Lecture6
 
Lecture5
Lecture5Lecture5
Lecture5
 
Lecture4
Lecture4Lecture4
Lecture4
 
Lecture3
Lecture3Lecture3
Lecture3
 
Lecture 1
Lecture 1Lecture 1
Lecture 1
 
Good example on ga
Good example on gaGood example on ga
Good example on ga
 
Lect3
Lect3Lect3
Lect3
 
Lect4
Lect4Lect4
Lect4
 
Lect4
Lect4Lect4
Lect4
 
Above theclouds
Above thecloudsAbove theclouds
Above theclouds
 
Inet prog
Inet progInet prog
Inet prog
 
Form
FormForm
Form
 
8051 experiments1
8051 experiments18051 experiments1
8051 experiments1
 

Lecture1

  • 1. SDSU CS 505 Lectures Slides Spring 2002 Instructors: Amit Majumdar Faramarz Valafar
  • 2. 2 Introduction (Review of serial computing as needed) • Parallel Computing – Real Life Scenario • Cost Effectiveness of Parallel Processors • What is parallel computing? • Why do parallel computing? • Types of parallel computing • What are some limits of parallel computing?
  • 3. 3 Parallel Computing – Real Life Scenario • Stacking or reshelving of a set of library books. Assume books are organized into shelves and shelves are grouped into bays. Single worker can only do it in a certain rate. We can speed it up by employing multiple workers. What is the best strategy ? 1. Simple way is to divide the total books equally among workers. Each worker stacks the books one at a time. Worker must walk all over the library. 2. Alternate way is to assign fixed disjoint sets of bay to each worker. Each worker is assigned equal # of books arbitrarily. Workers stack books in their bays or pass to another worker responsible for the bay it belongs to.
  • 4. 4 Parallel Computing – Real Life Scenario • Parallel processing allows to accomplish a task faster by dividing the work into a set of substacks assigned to multiple workers. • Assigning a set of books to workers is task partitioning. Passing of books to each other is an example of communication between subtasks. • For some problems assigning work to multiple workers might be more time consuming than doing it locally. • Some problems may be completely serial; e.g. digging a post hole. Poorly suited to parallel processing. • All problems are not equally amenable to parallel processing.
  • 5. 5 Wheather Modelling and Forcasting Consider 3000 X 3000 miles, and height of 11 miles. For modeling partition into segments of 0.1X0.1X0.1 cubic miles = ~1011 segments. Lets take 2-day period and parameters need to be computed every 30 min. Assume the computations take 100 instrs. A single update takes 1013 instrs. For two days we have total instrs. of 1015 . For serial computer with 1010 instrs./sec, this takes 280 hrs to predict next 48 hrs !! Lets take 1000 processors capable of 108 instrs/sec. Each processor will do 108 segments. For 2 days we have 1012 instrs. Calculation done in 3 hrs !! Currently all major weather forcast centers (US, Europe, Asia) have supercomputers with 1000s of processors.
  • 6. 6 Other Examples • Vehicle design and dynamics • Analysis of protein structures • Human genome work • Quantum chromodynamics, Astrophysics • Ocean modeling • Imaging and Rendering • Petroleum exploration • Nuclear Weapon design • Database query • Ozone layer monitoring • Natural language understanding • Study of chemical phenomena • And many other grand challenge projects
  • 7. 7 Cost Effectiveness of Parallel Processors • Currently the speed of off-the shelf micro processors is within one order of magnitude of the fastest serial computers. Micro processors cost many orders of magnitude less. • Connect only a few micro processors together to form a parallel computer with speed comparable to fastest serial computers. Cost of such parallel computer is less. • Connect a large number of processors into parallel computer overcomes the saturation rate of serial computers. • Parallel computers can provide much higher raw computation power than fastest serial computes. • Need to have actual applications that can take advantage of this high power.
  • 8. 8 What is Parallel Computing? • Parallel computing: use of multiple computers or processors working together on a common task. – Each processor works on its section of the problem – Processors can exchange information Grid of Problem to be solved CPU #1 works on this area of the problem CPU #3 works on this area of the problem CPU #4 works on this area of the problem CPU #2 works on this area of the problem y x exchange exchange
  • 9. 9 Why Do Parallel Computing? • Limits of single CPU computing – Available memory – Performance • Parallel computing allows: – Solve problems that don’t fit on a single CPU’s memory space – Solve problems that can’t be solved in a reasonable time • We can run… – Larger problems – Faster – More cases
  • 10. Types of Parallelism : Two Extremes • Data parallel – Each processor performs the same task on different data – Example - grid problems • Task parallel – Each processor performs a different task – Example - signal processing • Most applications fall somewhere on the continuum between these two extremes
  • 11. Typical Data Parallel Program • Example: integrate 2-D propagation problem: 2 2 2 2 y B x D t ∂ Ψ∂ ⋅+ ∂ Ψ∂ ⋅= ∂ Ψ∂ 2 11 2 11 1 22 y fff B x fff D t ff n i n i n i n i n i n i n i n i ∆ −− ⋅+ ∆ −− ⋅= ∆ − −+−+ + Starting partial differential equation: Finite Difference Approximation: PE #0 PE #1 PE #2 PE #4 PE #5 PE #6 PE #3 PE #7 y x
  • 12. Basics of Data Parallel Programming • One code will run on 2 CPUs • Program has array of data to be operated on by 2 CPU so array is split into two parts. program: … if CPU=a then low_limit=1 upper_limit=50 elseif CPU=b then low_limit=51 upper_limit=100 end if do I = low_limit, upper_limit work on A(I) end do ... end program CPU A CPU B program: … low_limit=1 upper_limit=50 do I= low_limit, upper_limit work on A(I) end do … end program program: … low_limit=51 upper_limit=100 do I= low_limit, upper_limit work on A(I) end do … end program
  • 13. Typical Task Parallel Application • Example: Signal Processing • Use one processor for each task • Can use more processors if one is overloaded DATA Normalize Task FFT Task Multiply Task Inverse FFT Task
  • 14. Basics of Task Parallel Programming • One code will run on 2 CPUs • Program has 2 tasks (a and b) to be done by 2 CPUs program.f: … initialize ... if CPU=a then do task a elseif CPU=b then do task b end if …. end program CPU A CPU B program.f: … initialize … do task a … end program program.f: … initialize … do task b … end program
  • 15. Limits of Parallel Computing • Theoretical Upper Limits – Amdahl’s Law • Practical Limits – Load balancing – Non-computational sections • Other Considerations – time to re-write code
  • 16. Theoretical Upper Limits to Performance • All parallel programs contain: – Serial sections – Parallel sections • Serial sections limit the parallel effectiveness • Speedup is the ratio of the time required to run a code on one processor to the time required to run the same code on multiple (N) processors • Amdahl’s Law states this formally
  • 17. tn = fp / N + fs( )t1 S = 1 fs + fp / N Amdahl’s Law • Amdahl’s Law places a strict limit on the speedup that can be realized by using multiple processors. – Effect of multiple processors on run time – Effect of multiple processors on speed up – Where • fs = serial fraction of code • fp = parallel fraction of code • N = number of processors • tn= time to run on N processors
  • 18. It takes only a small fraction of serial content in a code to degrade the parallel performance. 0 50 100 150 200 250 0 50 100 150 200 250 Number of processors fp = 1.000 fp = 0.999 fp = 0.990 fp = 0.900 Illustration of Amdahl's Law Speedup
  • 19. Amdahl’s Law provides a theoretical upper limit on parallel speedup assuming that there are no costs for speedup assuming that there are no costs for communications. In reality, communications will result in a further degradation of performance. 0 10 20 30 40 50 60 70 80 0 50 100 150 200 250 Number of processors Amdahl's Law Reality fp = 0.99 Practical Limits: Amdahl’s Law vs. Reality Speedup
  • 20. Practical Limits: Amdahl’s Law vs. Reality • In reality, Amdahl’s Law is limited by many things: – Communications – I/O – Load balancing (waiting) – Scheduling (shared processors or memory)
  • 21. Other Considerations • Writing effective parallel application is difficult! – Load balance is important – Communication can limit parallel efficiency – Serial time can dominate • Is it worth your time to rewrite your application? – Do the CPU requirements justify parallelization? – Will the code be used just once?
  • 22. Issues in Parallel Computing • Design of parallel computers : Design so that it scales to a large # of processor and are capable of supporting fast comunication and data sharing among processors. • Design of Efficient Algorithms : Designing parallel algorithms are different from designing serial algorithms. Significant amount of work is being done for numerical and non-numerical parallel algorithms • Methods of Evaluating Parallel Algorithms : Given a parallel computer and a parallel algorithm we need to evaluate the performance of the resulting system. How fast problem is solved and how efficiently the processors are used.
  • 23. Issues in Parallel Computing • Parallel Computer Languages : Parallel algorithms are implemented using a programming language. This language must be flexible enough to allow efficient implementation and must be easy to program in. Must efficiently use the hardware. • Parallel Proramming Tools : Tools (compilers, libraries, debuggers, other monitoring or performance evaluation tools) must shield users from low level machine characteristics. • Portable Parallel Programs: This is one of the main problems with current parallel computers. Program written for one parallel computer require extensive work to port to another parallel computer.
  • 24. Issues in Parallel Computing • Automatic Programming of Parallel Computers : This is about design of parallelizing compilers which extract implicit parallelism from programs that have not been explicitly parallelized. But this approach has limited potential for exploiting the power of large parallel machines.