1
Welcome to
2
2
Instructor – Bassem Abd-El-Atty
◼ Classroom: 1
◼ Lecture : Sundays 3:00 PM – 6:00 PM
◼ Office: Third floor
◼ E-Mail: bassem.abdelatty@fci.luxor.edu.eg
3
Introducti
ons
Name
Preferred Name
Interesting Fact / Hobby
/ Excited to Learn
Introductions
4
Discussion
◼ What did we learn? Why do this?
◼ Why do this with computers?
◼ What would happen if we could only move
one quantum at a time?
5
Course Description
Topics covered in this course will include: Introduction
parallel computation models− Data and task parallelism −
Shared and Distributed memory parallel machine
architecture concepts − Interconnection networks −
Basics of threaded parallel computation− Parallel
algorithmic design − Languages and libraries for threaded
parallel programming − Languages and libraries for
distributed memory parallel programming − Co-
processor techniques − Experimental techniques −
Measuring performance and computing speed-up.
6
Outline
◼ Why parallel computing
◼ Parallel hardware and parallel software
◼ Shared Memory and OpenMP Programming
◼ Distributed Memory and MPI Programming
◼ Parallel Algorithms, Performance Models,
Scalability, etc
◼ Shared-memory programming with
Pthreads
◼ GPU programming with CUDA
7
Resources
8
Grading
◼ Midterm Exam
◼ Quizzes
◼ Final Project
◼ Final implementation
◼ Presentation
◼ Paper Presentation
◼ Programming Assignments and Homework
9
Copyright © 2010, Elsevier Inc. All rights Reserved
Chapter 1
Why Parallel Computing?
10
Copyright © 2010, Elsevier Inc. All rights Reserved
Roadmap
◼ Why we need ever-increasing performance.
◼ Why we’re building parallel systems.
◼ Why we need to write parallel programs.
◼ How do we write parallel programs?
◼ What we’ll be doing.
◼ Concurrent, parallel, distributed!
11
Changing times
Copyright © 2010, Elsevier Inc. All rights Reserved
◼ From 1986 – 2002, microprocessors were
speeding like a rocket, increasing in
performance an average of 50% per year.
◼ Since then, it’s dropped to about 20%
increase per year.
12
An intelligent solution
Copyright © 2010, Elsevier Inc. All rights Reserved
◼ Instead of designing and building faster
microprocessors, put multiple processors
on a single integrated circuit.
13
Now it’s up to the programmers
◼ Adding more processors doesn’t help
much if programmers aren’t aware of
them…
◼ … or don’t know how to use them.
◼ Serial programs don’t benefit from this
approach (in most cases).
Copyright © 2010, Elsevier Inc. All rights Reserved
14
Why we need ever-increasing
performance
◼ Computational power is increasing, but so
are our computation problems and needs.
◼ Problems we never dreamed of have been
solved because of past increases, such as
decoding the human genome.
◼ More complex problems are still waiting to
be solved.
Copyright © 2010, Elsevier Inc. All rights Reserved
15
Weather forecast
Copyright © 2010, Elsevier Inc. All rights Reserved
16
Protein folding
Copyright © 2010, Elsevier Inc. All rights Reserved
17
Drug discovery
Copyright © 2010, Elsevier Inc. All rights Reserved
18
Data analysis
Copyright © 2010, Elsevier Inc. All rights Reserved
19
Why we’re building parallel
systems
◼ Up to now, performance increases have
been attributable to increasing density of
transistors.
◼ But there are
inherent
problems.
Copyright © 2010, Elsevier Inc. All rights Reserved
Moore’s law
• Commonly (mis) stated as
• “Computer performance doubles every 18 months”
• Gordon Moore observed in 1965
• “The complexity… has increased roughly a factor of
two per year. [It] can be expected to continue…for at
least 10 years”
• Its about number of transistors per chip
• Funny thing is: it held true for 40+ years
• And still going until 2030
• “Self fulfilling prophecy”
20
21
Number of Transistors/chip?
• Well, they will keep on growing for the next 5 years
• May be a bit slowly
• Current technology is 14 nanometers (Now 3 nanometers )
• AMD EPYC 7401P (19.2 billion transistors on 4 dies on the package)
• 10 nm
• We may go to 2 nanometers feature size
• i.e. gap between two wires (as a simple definition)
• For comparison:
• Distance between a carbon and a Hydrogen atom is 1 Angstrom = 0.1
nanometer!
• Silicon-Silicon bonds are longer
• 5 Ao lattice spacing (image: wikipedia)
• i.e. 0.5 nanometer
• So, we are close to atomic units!
22
Consequence
• We will get to 30-50 billion transistors/chip!
• What to do with them?
• Put more processors on a chip
• Beginning of the multicore era after 2003
• Number of cores per chip doubles every X years
• X= 2? 3?
23
24
A little physics lesson
◼ Smaller transistors = faster processors.
◼ Faster processors = increased power
consumption.
◼ Increased power consumption = increased
heat.
◼ Increased heat = unreliable processors.
Copyright © 2010, Elsevier Inc. All rights Reserved
25
Solution
◼ Move away from single-core systems to
multicore processors.
◼ “core” = central processing unit (CPU)
Copyright © 2010, Elsevier Inc. All rights Reserved
◼ Introducing parallelism!!!
26
Alternative: Parallelism
◼ If we find killer apps that need all the
parallel power we can bring to bear
◼ With 50B transistors, at least 100+ processor
cores on each chip
◼ There is a tremendous competitive
advantage to building such a killer app
◼ So, given our history, we will find it
◼ What are the enabling factors:
◼ Finding the application areas.
◼ Parallel programming skills
27
A Few Candidate Areas
◼ Simple parallelism:
◼ Search images, scan files, ..
◼ Speech recognition:
◼ Almost perfect already
◼ But speaker dependent, minor training, and needs non-
noisy environment
◼ Frontier: speaker independent recognition with non-
controlled environment
◼ Broadly: Artificial intelligence
◼ Data centers (data analytics, queries, cloud
computing)
◼ And, of course, HPC (High Performance Computing)
◼ typically for CSE (Computational science and Engineering)
28
Parallel Programming Skills
◼ So, all machines will be (are?) parallel
◼ So, almost all programs will be parallel
― True?
◼ There are 10 million programmers in the
world
― Approximate estimate
◼ All programmers must become parallel
programmers
― Right? What do you think?
29
Programming Models Innovations
◼ Expect a lot of novel programming models
◼ There is scope for new languages, unlike now
◼ Only Java broke through after C/C++
◼ This is good news:
◼ If you are a computer scientist wanting to
develop new languages
◼ Bad news:
◼ If you are an application developer
◼ DO NOT WAIT FOR “AutoMagic”
parallelizing compiler!
30
Why we need to write parallel
programs
◼ Running multiple instances of a serial
program often isn’t very useful.
◼ Think of running multiple instances of your
favorite game.
◼ What you really want is for
it to run faster.
Copyright © 2010, Elsevier Inc. All rights Reserved
31
Approaches to the serial problem
◼ Rewrite serial programs so that they’re
parallel.
◼ Write translation programs that
automatically convert serial programs into
parallel programs.
◼ This is very difficult to do.
◼ Success has been limited.
Copyright © 2010, Elsevier Inc. All rights Reserved
32
More problems
◼ Some coding constructs can be
recognized by an automatic program
generator, and converted to a parallel
construct.
◼ However, it’s likely that the result will be a
very inefficient program.
◼ Sometimes the best parallel solution is to
step back and devise an entirely new
algorithm.
Copyright © 2010, Elsevier Inc. All rights Reserved
33
Example
◼ Compute n values and add them together.
◼ Serial solution:
Copyright © 2010, Elsevier Inc. All rights Reserved
34
Example (cont.)
◼ We have p cores, p much smaller than n.
◼ Each core performs a partial sum of
approximately n/p values.
Copyright © 2010, Elsevier Inc. All rights Reserved
Each core uses it’s own private variables
and executes this block of code
independently of the other cores.
35
Example (cont.)
◼ After each core completes execution of the
code, is a private variable my_sum
contains the sum of the values computed
by its calls to Compute_next_value.
◼ Ex., 8 cores, n = 24, then the calls to
Compute_next_value return:
Copyright © 2010, Elsevier Inc. All rights Reserved
1,4,3, 9,2,8, 5,1,1, 5,2,7, 2,5,0, 4,1,8, 6,5,1, 2,3,9
36
Example (cont.)
◼ Once all the cores are done computing
their private my_sum, they form a global
sum by sending results to a designated
“master” core which adds the final result.
Copyright © 2010, Elsevier Inc. All rights Reserved
37
Example (cont.)
Copyright © 2010, Elsevier Inc. All rights Reserved
38
Example (cont.)
Copyright © 2010, Elsevier Inc. All rights Reserved
Core 0 1 2 3 4 5 6 7
my_sum 8 19 7 15 7 13 12 14
Global sum
8 + 19 + 7 + 15 + 7 + 13 + 12 + 14 = 95
Core 0 1 2 3 4 5 6 7
my_sum 95 19 7 15 7 13 12 14
39
Copyright © 2010, Elsevier Inc. All rights Reserved
But wait!
There’s a much better way
to compute the global sum.
40
Better parallel algorithm
◼ Don’t make the master core do all the
work.
◼ Share it among the other cores.
◼ Pair the cores so that core 0 adds its result
with core 1’s result.
◼ Core 2 adds its result with core 3’s result,
etc.
◼ Work with odd and even numbered pairs of
cores.
Copyright © 2010, Elsevier Inc. All rights Reserved
41
Better parallel algorithm (cont.)
◼ Repeat the process now with only the
evenly ranked cores.
◼ Core 0 adds result from core 2.
◼ Core 4 adds the result from core 6, etc.
◼ Now cores divisible by 4 repeat the
process, and so forth, until core 0 has the
final result.
Copyright © 2010, Elsevier Inc. All rights Reserved
42
Multiple cores forming a global
sum
Copyright © 2010, Elsevier Inc. All rights Reserved
43
Analysis
◼ In the first example, the master core
performs 7 receives and 7 additions.
◼ In the second example, the master core
performs 3 receives and 3 additions.
◼ The improvement is more than a factor of 2!
Copyright © 2010, Elsevier Inc. All rights Reserved
44
Analysis (cont.)
◼ The difference is more dramatic with a
larger number of cores.
◼ If we have 1000 cores:
◼ The first example would require the master to
perform 999 receives and 999 additions.
◼ The second example would only require 10
receives and 10 additions.
◼ That’s an improvement of almost a factor
of 100!
Copyright © 2010, Elsevier Inc. All rights Reserved
45
How do we write parallel
programs?
◼ Task parallelism
◼ Partition various tasks carried out solving the
problem among the cores.
◼ Data parallelism
◼ Partition the data used in solving the problem
among the cores.
◼ Each core carries out similar operations on it’s
part of the data.
Copyright © 2010, Elsevier Inc. All rights Reserved
46
Professor P
Copyright © 2010, Elsevier Inc. All rights Reserved
15 questions
300 exams
47
Professor P’s grading assistants
Copyright © 2010, Elsevier Inc. All rights Reserved
TA#1
TA#2 TA#3
48
Division of work –
data parallelism
Copyright © 2010, Elsevier Inc. All rights Reserved
TA#1
TA#2
TA#3
100 exams
100 exams
100 exams
49
Division of work –
task parallelism
Copyright © 2010, Elsevier Inc. All rights Reserved
TA#1
TA#2
TA#3
Questions 1 - 5
Questions 6 - 10
Questions 11 - 15
50
Division of work –
data parallelism
Copyright © 2010, Elsevier Inc. All rights Reserved
51
Division of work –
task parallelism
Copyright © 2010, Elsevier Inc. All rights Reserved
Tasks
1) Receiving
2) Addition
52
Coordination
◼ Cores usually need to coordinate their work.
◼ Communication – one or more cores send
their current partial sums to another core.
◼ Load balancing – share the work evenly
among the cores so that one is not heavily
loaded.
◼ Synchronization – because each core works
at its own pace, make sure cores do not get
too far ahead of the rest.
Copyright © 2010, Elsevier Inc. All rights Reserved
53
What we’ll be doing
◼ Learning to write programs that are
explicitly parallel.
◼ Using the C++ language.
◼ Using three different extensions to C.
◼ Message-Passing Interface (MPI)
◼ Posix Threads (Pthreads)
◼ Open Multi-Processing (OpenMP)
Copyright © 2010, Elsevier Inc. All rights Reserved
54
Type of parallel systems
◼ Shared-memory
◼ The cores can share access to the computer’s
memory.
◼ Coordinate the cores by having them examine
and update shared memory locations.
◼ Distributed-memory
◼ Each core has its own, private memory.
◼ The cores must communicate explicitly by
sending messages across a network.
Copyright © 2010, Elsevier Inc. All rights Reserved
55
Type of parallel systems
Copyright © 2010, Elsevier Inc. All rights Reserved
Shared-memory Distributed-memory
56
Terminology
◼ Concurrent computing – a program is one
in which multiple tasks can be in progress
at any instant.
◼ Parallel computing – a program is one in
which multiple tasks cooperate closely to
solve a problem
◼ Distributed computing – a program may
need to cooperate with other programs to
solve a problem.
Copyright © 2010, Elsevier Inc. All rights Reserved
57
Concluding Remarks (1)
◼ The laws of physics have brought us to the
doorstep of multicore technology.
◼ Serial programs typically don’t benefit from
multiple cores.
◼ Automatic parallel program generation
from serial program code isn’t the most
efficient approach to get high performance
from multicore computers.
Copyright © 2010, Elsevier Inc. All rights Reserved
58
Concluding Remarks (2)
◼ Learning to write parallel programs
involves learning how to coordinate the
cores.
◼ Parallel programs are usually very
complex and therefore, require sound
program techniques and development.
Copyright © 2010, Elsevier Inc. All rights Reserved

Chapter1(parallel computing lecture1).pdf

  • 1.
  • 2.
    2 2 Instructor – BassemAbd-El-Atty ◼ Classroom: 1 ◼ Lecture : Sundays 3:00 PM – 6:00 PM ◼ Office: Third floor ◼ E-Mail: bassem.abdelatty@fci.luxor.edu.eg
  • 3.
    3 Introducti ons Name Preferred Name Interesting Fact/ Hobby / Excited to Learn Introductions
  • 4.
    4 Discussion ◼ What didwe learn? Why do this? ◼ Why do this with computers? ◼ What would happen if we could only move one quantum at a time?
  • 5.
    5 Course Description Topics coveredin this course will include: Introduction parallel computation models− Data and task parallelism − Shared and Distributed memory parallel machine architecture concepts − Interconnection networks − Basics of threaded parallel computation− Parallel algorithmic design − Languages and libraries for threaded parallel programming − Languages and libraries for distributed memory parallel programming − Co- processor techniques − Experimental techniques − Measuring performance and computing speed-up.
  • 6.
    6 Outline ◼ Why parallelcomputing ◼ Parallel hardware and parallel software ◼ Shared Memory and OpenMP Programming ◼ Distributed Memory and MPI Programming ◼ Parallel Algorithms, Performance Models, Scalability, etc ◼ Shared-memory programming with Pthreads ◼ GPU programming with CUDA
  • 7.
  • 8.
    8 Grading ◼ Midterm Exam ◼Quizzes ◼ Final Project ◼ Final implementation ◼ Presentation ◼ Paper Presentation ◼ Programming Assignments and Homework
  • 9.
    9 Copyright © 2010,Elsevier Inc. All rights Reserved Chapter 1 Why Parallel Computing?
  • 10.
    10 Copyright © 2010,Elsevier Inc. All rights Reserved Roadmap ◼ Why we need ever-increasing performance. ◼ Why we’re building parallel systems. ◼ Why we need to write parallel programs. ◼ How do we write parallel programs? ◼ What we’ll be doing. ◼ Concurrent, parallel, distributed!
  • 11.
    11 Changing times Copyright ©2010, Elsevier Inc. All rights Reserved ◼ From 1986 – 2002, microprocessors were speeding like a rocket, increasing in performance an average of 50% per year. ◼ Since then, it’s dropped to about 20% increase per year.
  • 12.
    12 An intelligent solution Copyright© 2010, Elsevier Inc. All rights Reserved ◼ Instead of designing and building faster microprocessors, put multiple processors on a single integrated circuit.
  • 13.
    13 Now it’s upto the programmers ◼ Adding more processors doesn’t help much if programmers aren’t aware of them… ◼ … or don’t know how to use them. ◼ Serial programs don’t benefit from this approach (in most cases). Copyright © 2010, Elsevier Inc. All rights Reserved
  • 14.
    14 Why we needever-increasing performance ◼ Computational power is increasing, but so are our computation problems and needs. ◼ Problems we never dreamed of have been solved because of past increases, such as decoding the human genome. ◼ More complex problems are still waiting to be solved. Copyright © 2010, Elsevier Inc. All rights Reserved
  • 15.
    15 Weather forecast Copyright ©2010, Elsevier Inc. All rights Reserved
  • 16.
    16 Protein folding Copyright ©2010, Elsevier Inc. All rights Reserved
  • 17.
    17 Drug discovery Copyright ©2010, Elsevier Inc. All rights Reserved
  • 18.
    18 Data analysis Copyright ©2010, Elsevier Inc. All rights Reserved
  • 19.
    19 Why we’re buildingparallel systems ◼ Up to now, performance increases have been attributable to increasing density of transistors. ◼ But there are inherent problems. Copyright © 2010, Elsevier Inc. All rights Reserved
  • 20.
    Moore’s law • Commonly(mis) stated as • “Computer performance doubles every 18 months” • Gordon Moore observed in 1965 • “The complexity… has increased roughly a factor of two per year. [It] can be expected to continue…for at least 10 years” • Its about number of transistors per chip • Funny thing is: it held true for 40+ years • And still going until 2030 • “Self fulfilling prophecy” 20
  • 21.
  • 22.
    Number of Transistors/chip? •Well, they will keep on growing for the next 5 years • May be a bit slowly • Current technology is 14 nanometers (Now 3 nanometers ) • AMD EPYC 7401P (19.2 billion transistors on 4 dies on the package) • 10 nm • We may go to 2 nanometers feature size • i.e. gap between two wires (as a simple definition) • For comparison: • Distance between a carbon and a Hydrogen atom is 1 Angstrom = 0.1 nanometer! • Silicon-Silicon bonds are longer • 5 Ao lattice spacing (image: wikipedia) • i.e. 0.5 nanometer • So, we are close to atomic units! 22
  • 23.
    Consequence • We willget to 30-50 billion transistors/chip! • What to do with them? • Put more processors on a chip • Beginning of the multicore era after 2003 • Number of cores per chip doubles every X years • X= 2? 3? 23
  • 24.
    24 A little physicslesson ◼ Smaller transistors = faster processors. ◼ Faster processors = increased power consumption. ◼ Increased power consumption = increased heat. ◼ Increased heat = unreliable processors. Copyright © 2010, Elsevier Inc. All rights Reserved
  • 25.
    25 Solution ◼ Move awayfrom single-core systems to multicore processors. ◼ “core” = central processing unit (CPU) Copyright © 2010, Elsevier Inc. All rights Reserved ◼ Introducing parallelism!!!
  • 26.
    26 Alternative: Parallelism ◼ Ifwe find killer apps that need all the parallel power we can bring to bear ◼ With 50B transistors, at least 100+ processor cores on each chip ◼ There is a tremendous competitive advantage to building such a killer app ◼ So, given our history, we will find it ◼ What are the enabling factors: ◼ Finding the application areas. ◼ Parallel programming skills
  • 27.
    27 A Few CandidateAreas ◼ Simple parallelism: ◼ Search images, scan files, .. ◼ Speech recognition: ◼ Almost perfect already ◼ But speaker dependent, minor training, and needs non- noisy environment ◼ Frontier: speaker independent recognition with non- controlled environment ◼ Broadly: Artificial intelligence ◼ Data centers (data analytics, queries, cloud computing) ◼ And, of course, HPC (High Performance Computing) ◼ typically for CSE (Computational science and Engineering)
  • 28.
    28 Parallel Programming Skills ◼So, all machines will be (are?) parallel ◼ So, almost all programs will be parallel ― True? ◼ There are 10 million programmers in the world ― Approximate estimate ◼ All programmers must become parallel programmers ― Right? What do you think?
  • 29.
    29 Programming Models Innovations ◼Expect a lot of novel programming models ◼ There is scope for new languages, unlike now ◼ Only Java broke through after C/C++ ◼ This is good news: ◼ If you are a computer scientist wanting to develop new languages ◼ Bad news: ◼ If you are an application developer ◼ DO NOT WAIT FOR “AutoMagic” parallelizing compiler!
  • 30.
    30 Why we needto write parallel programs ◼ Running multiple instances of a serial program often isn’t very useful. ◼ Think of running multiple instances of your favorite game. ◼ What you really want is for it to run faster. Copyright © 2010, Elsevier Inc. All rights Reserved
  • 31.
    31 Approaches to theserial problem ◼ Rewrite serial programs so that they’re parallel. ◼ Write translation programs that automatically convert serial programs into parallel programs. ◼ This is very difficult to do. ◼ Success has been limited. Copyright © 2010, Elsevier Inc. All rights Reserved
  • 32.
    32 More problems ◼ Somecoding constructs can be recognized by an automatic program generator, and converted to a parallel construct. ◼ However, it’s likely that the result will be a very inefficient program. ◼ Sometimes the best parallel solution is to step back and devise an entirely new algorithm. Copyright © 2010, Elsevier Inc. All rights Reserved
  • 33.
    33 Example ◼ Compute nvalues and add them together. ◼ Serial solution: Copyright © 2010, Elsevier Inc. All rights Reserved
  • 34.
    34 Example (cont.) ◼ Wehave p cores, p much smaller than n. ◼ Each core performs a partial sum of approximately n/p values. Copyright © 2010, Elsevier Inc. All rights Reserved Each core uses it’s own private variables and executes this block of code independently of the other cores.
  • 35.
    35 Example (cont.) ◼ Aftereach core completes execution of the code, is a private variable my_sum contains the sum of the values computed by its calls to Compute_next_value. ◼ Ex., 8 cores, n = 24, then the calls to Compute_next_value return: Copyright © 2010, Elsevier Inc. All rights Reserved 1,4,3, 9,2,8, 5,1,1, 5,2,7, 2,5,0, 4,1,8, 6,5,1, 2,3,9
  • 36.
    36 Example (cont.) ◼ Onceall the cores are done computing their private my_sum, they form a global sum by sending results to a designated “master” core which adds the final result. Copyright © 2010, Elsevier Inc. All rights Reserved
  • 37.
    37 Example (cont.) Copyright ©2010, Elsevier Inc. All rights Reserved
  • 38.
    38 Example (cont.) Copyright ©2010, Elsevier Inc. All rights Reserved Core 0 1 2 3 4 5 6 7 my_sum 8 19 7 15 7 13 12 14 Global sum 8 + 19 + 7 + 15 + 7 + 13 + 12 + 14 = 95 Core 0 1 2 3 4 5 6 7 my_sum 95 19 7 15 7 13 12 14
  • 39.
    39 Copyright © 2010,Elsevier Inc. All rights Reserved But wait! There’s a much better way to compute the global sum.
  • 40.
    40 Better parallel algorithm ◼Don’t make the master core do all the work. ◼ Share it among the other cores. ◼ Pair the cores so that core 0 adds its result with core 1’s result. ◼ Core 2 adds its result with core 3’s result, etc. ◼ Work with odd and even numbered pairs of cores. Copyright © 2010, Elsevier Inc. All rights Reserved
  • 41.
    41 Better parallel algorithm(cont.) ◼ Repeat the process now with only the evenly ranked cores. ◼ Core 0 adds result from core 2. ◼ Core 4 adds the result from core 6, etc. ◼ Now cores divisible by 4 repeat the process, and so forth, until core 0 has the final result. Copyright © 2010, Elsevier Inc. All rights Reserved
  • 42.
    42 Multiple cores forminga global sum Copyright © 2010, Elsevier Inc. All rights Reserved
  • 43.
    43 Analysis ◼ In thefirst example, the master core performs 7 receives and 7 additions. ◼ In the second example, the master core performs 3 receives and 3 additions. ◼ The improvement is more than a factor of 2! Copyright © 2010, Elsevier Inc. All rights Reserved
  • 44.
    44 Analysis (cont.) ◼ Thedifference is more dramatic with a larger number of cores. ◼ If we have 1000 cores: ◼ The first example would require the master to perform 999 receives and 999 additions. ◼ The second example would only require 10 receives and 10 additions. ◼ That’s an improvement of almost a factor of 100! Copyright © 2010, Elsevier Inc. All rights Reserved
  • 45.
    45 How do wewrite parallel programs? ◼ Task parallelism ◼ Partition various tasks carried out solving the problem among the cores. ◼ Data parallelism ◼ Partition the data used in solving the problem among the cores. ◼ Each core carries out similar operations on it’s part of the data. Copyright © 2010, Elsevier Inc. All rights Reserved
  • 46.
    46 Professor P Copyright ©2010, Elsevier Inc. All rights Reserved 15 questions 300 exams
  • 47.
    47 Professor P’s gradingassistants Copyright © 2010, Elsevier Inc. All rights Reserved TA#1 TA#2 TA#3
  • 48.
    48 Division of work– data parallelism Copyright © 2010, Elsevier Inc. All rights Reserved TA#1 TA#2 TA#3 100 exams 100 exams 100 exams
  • 49.
    49 Division of work– task parallelism Copyright © 2010, Elsevier Inc. All rights Reserved TA#1 TA#2 TA#3 Questions 1 - 5 Questions 6 - 10 Questions 11 - 15
  • 50.
    50 Division of work– data parallelism Copyright © 2010, Elsevier Inc. All rights Reserved
  • 51.
    51 Division of work– task parallelism Copyright © 2010, Elsevier Inc. All rights Reserved Tasks 1) Receiving 2) Addition
  • 52.
    52 Coordination ◼ Cores usuallyneed to coordinate their work. ◼ Communication – one or more cores send their current partial sums to another core. ◼ Load balancing – share the work evenly among the cores so that one is not heavily loaded. ◼ Synchronization – because each core works at its own pace, make sure cores do not get too far ahead of the rest. Copyright © 2010, Elsevier Inc. All rights Reserved
  • 53.
    53 What we’ll bedoing ◼ Learning to write programs that are explicitly parallel. ◼ Using the C++ language. ◼ Using three different extensions to C. ◼ Message-Passing Interface (MPI) ◼ Posix Threads (Pthreads) ◼ Open Multi-Processing (OpenMP) Copyright © 2010, Elsevier Inc. All rights Reserved
  • 54.
    54 Type of parallelsystems ◼ Shared-memory ◼ The cores can share access to the computer’s memory. ◼ Coordinate the cores by having them examine and update shared memory locations. ◼ Distributed-memory ◼ Each core has its own, private memory. ◼ The cores must communicate explicitly by sending messages across a network. Copyright © 2010, Elsevier Inc. All rights Reserved
  • 55.
    55 Type of parallelsystems Copyright © 2010, Elsevier Inc. All rights Reserved Shared-memory Distributed-memory
  • 56.
    56 Terminology ◼ Concurrent computing– a program is one in which multiple tasks can be in progress at any instant. ◼ Parallel computing – a program is one in which multiple tasks cooperate closely to solve a problem ◼ Distributed computing – a program may need to cooperate with other programs to solve a problem. Copyright © 2010, Elsevier Inc. All rights Reserved
  • 57.
    57 Concluding Remarks (1) ◼The laws of physics have brought us to the doorstep of multicore technology. ◼ Serial programs typically don’t benefit from multiple cores. ◼ Automatic parallel program generation from serial program code isn’t the most efficient approach to get high performance from multicore computers. Copyright © 2010, Elsevier Inc. All rights Reserved
  • 58.
    58 Concluding Remarks (2) ◼Learning to write parallel programs involves learning how to coordinate the cores. ◼ Parallel programs are usually very complex and therefore, require sound program techniques and development. Copyright © 2010, Elsevier Inc. All rights Reserved