Introduction to Parallel
Computation
FDI 2007 Track Q
Day 1 – Morning Session
Track Q Overview
• Monday
– Intro to Parallel Computation
– Parallel Architectures and Parallel Programming Concepts
– Message-Passing Paradigm, MPI
– MPI Topics
• Tuesday
– Data-Parallelism
– Master/Worker and Asynchronous Communication
– Parallelizing Sequential Codes
– Performance: Evaluation, Tuning, Visualization
• Wednesday
– Shared-Memory Parallel Computing, OpenMP
– Practicum, BYOC
What is Parallel Computing?
• “Multiple CPUs cooperating to solve one problem.”
• Motivation:
– Solve a given problem faster
– Solve a larger problem in the same time
– (Take advantage of multiple cores in an SMP)
• Distinguished from …
– Distributed computing
– Grid computing
– Ensemble computing
Why is Parallel Computing
Difficult?
• Existing codes are too valuable to discard.
• We don’t think in parallel.
• There are hard problems that must be solved
without sacrificing performance, e.g.,
synchronization, communication, load balancing.
• Parallel computing platforms are too diverse,
programming environments are too low-level.
Parallel Architectures
• An evolving field: vector supercomputers, MPPs,
clusters, constellations, multi-core, GPUs, …
• Shared-Memory vs. Distributed-Memory
CPU CPU
CPU CPU
Switch/bus
Memory
Interconnect
Compute
Node
Compute
Node
Compute
Node
Compute
Node
Compute
Node
M
M
M
M
M
Parallel Algorithms: Some
Approaches
• Loop-based parallelism
• Functional vs data parallelism
• Domain decomposition
• Pipelining
• Master/worker
• Embarrassingly parallel, ensembles,
screen-saver science
Parallel Algorithms: Loop-Based
Parallelism
Do I = 1 to n
.
.
.
End do
Do in parallel I = 1 to n
.
.
.
End do
→
Parallel Algorithms: Functional
vs. Data Parallelism
f0
f0
g0 g1 g2
f1 f1
g(1:4) g(5:8) g(9:12)
Parallel Algorithms: Domain
Decomposition
Parallel Algorithms: Pipelining
f1 f2 f3
Parallel Algorithms:
Master/Worker
Master
W1
W2 W3
W4
W5
W6
Evaluating and Predicting
Performance
• Theoretical approaches: asymptotic
analysis, complexity theory, analytic
modeling of systems and algorithms.
• Empirical approaches: benchmarking,
metrics, visualization.
“How do I program a parallel
computer?”
Some possible answers:
1. As always.
• Rely on compiler, libraries, run-time system.
• Comment: general solution very unlikely
2. New or extended programming language.
• Re-write code, use new compilers & run-time.
• Comment: lots of academia, small market-share
“How do I program a parallel computer?”
(cont’d)
3. Existing language + compiler directives.
• Compiler extension or pre-preprocessor optionally
handles explicitly parallel constructs.
• Comment: OpenMP widely-used for shared-memory
machines.
4. Existing language + library calls.
• Explicitly (re-)code for threads or message passing.
• Comment: most common approach, especially for
distributed memory machines.

Parallel computing and programming of parallel environment

  • 1.
    Introduction to Parallel Computation FDI2007 Track Q Day 1 – Morning Session
  • 2.
    Track Q Overview •Monday – Intro to Parallel Computation – Parallel Architectures and Parallel Programming Concepts – Message-Passing Paradigm, MPI – MPI Topics • Tuesday – Data-Parallelism – Master/Worker and Asynchronous Communication – Parallelizing Sequential Codes – Performance: Evaluation, Tuning, Visualization • Wednesday – Shared-Memory Parallel Computing, OpenMP – Practicum, BYOC
  • 3.
    What is ParallelComputing? • “Multiple CPUs cooperating to solve one problem.” • Motivation: – Solve a given problem faster – Solve a larger problem in the same time – (Take advantage of multiple cores in an SMP) • Distinguished from … – Distributed computing – Grid computing – Ensemble computing
  • 4.
    Why is ParallelComputing Difficult? • Existing codes are too valuable to discard. • We don’t think in parallel. • There are hard problems that must be solved without sacrificing performance, e.g., synchronization, communication, load balancing. • Parallel computing platforms are too diverse, programming environments are too low-level.
  • 5.
    Parallel Architectures • Anevolving field: vector supercomputers, MPPs, clusters, constellations, multi-core, GPUs, … • Shared-Memory vs. Distributed-Memory CPU CPU CPU CPU Switch/bus Memory Interconnect Compute Node Compute Node Compute Node Compute Node Compute Node M M M M M
  • 6.
    Parallel Algorithms: Some Approaches •Loop-based parallelism • Functional vs data parallelism • Domain decomposition • Pipelining • Master/worker • Embarrassingly parallel, ensembles, screen-saver science
  • 7.
    Parallel Algorithms: Loop-Based Parallelism DoI = 1 to n . . . End do Do in parallel I = 1 to n . . . End do →
  • 8.
    Parallel Algorithms: Functional vs.Data Parallelism f0 f0 g0 g1 g2 f1 f1 g(1:4) g(5:8) g(9:12)
  • 9.
  • 10.
  • 11.
  • 12.
    Evaluating and Predicting Performance •Theoretical approaches: asymptotic analysis, complexity theory, analytic modeling of systems and algorithms. • Empirical approaches: benchmarking, metrics, visualization.
  • 13.
    “How do Iprogram a parallel computer?” Some possible answers: 1. As always. • Rely on compiler, libraries, run-time system. • Comment: general solution very unlikely 2. New or extended programming language. • Re-write code, use new compilers & run-time. • Comment: lots of academia, small market-share
  • 14.
    “How do Iprogram a parallel computer?” (cont’d) 3. Existing language + compiler directives. • Compiler extension or pre-preprocessor optionally handles explicitly parallel constructs. • Comment: OpenMP widely-used for shared-memory machines. 4. Existing language + library calls. • Explicitly (re-)code for threads or message passing. • Comment: most common approach, especially for distributed memory machines.

Editor's Notes

  • #1 Introductions. My own background influences what I say. Informal, collaborative, hands-on, interactive, practical Adaptive Just a start Pitched pretty low. Parallel Programming or Parallel Computation? - not all have to be parallel programmers - but all want to be intelligent users, participants How many have a problem in mind? How many have a code in mind? How many already use parallel computers, but want to know more? How many already know MPI?
  • #3 Your problem must be big enough, and it must scale: - if your problem takes seconds and you need milliseconds, this is not for you. - if your problem takes seconds but you need to solve a million of these problems, you need grid computing.
  • #4  Another reason it’s hard: diversity of platforms and environments. Programming environments are still too low-level. Maybe this shortcourse will convince you it’s not for you. You still have some choices: 1) use someone else’s code; 2) use libraries; 3) use parallel languages or environments, e.g., UPC, Global Arrays.
  • #5 Quick history of parallel architectures. Questions? Important influences: fed’l funding, computationl science, performance is critical, market forces. Both shared-memory and distributed-memory approaches have many things in common --- basic issues are similar. Stop here for coffee break, if not before. Request acccount in 124 Lab if you don’t have one.
  • #6 So how do I parallelize a code? Or how do I evaluate a problem/algorithm to see if it has strong potential for parallelism? (Don’t think about syntax or programming/computing environments yet … just about identifying potential parallelism.) Key questions: how do I split up the work? How do I split up the data? And with shared-memory, maybe we can ignore the 2nd question(?). Review questions: - which of these are two names for the same thing? - which are the least potentially scalable? Most? - which might we be able to automate? There are tools/environments/languages that can help with all of these … some much lower level than others.
  • #7 Split the iteration space among the processors. Key question: are loop iterations independent?
  • #8 Do independent functions at the same time Vs. Do same function on different data at the same time. Notice use of control-flow diagram.
  • #9 Work and data-decomposition derived from decomposition of the physical domain. Very common in physics-based simulations.
  • #10 A type of functional decomposition, but each step proceeds in sequence for a given piece of data. So the arrows represent data-flow here.
  • #11 Almost embarrassingly parallel. Common in some kinds of optimization algorithms, searching.
  • #12 Theoretical approaches critical for: prediction, analysis, understanding, design of new systems/algorithms. We won’t cover this. Empirical: with performance a critical issue, empirical approaches are critical as well. We’ll barely mention this.
  • #13 OK. Now syntax and programming environments. Some small progress in new languages recently.
  • #14 Pthreads an example of last approach for SMPs. Why has last alternative become dominant? - portability - efficiency - scalability