1b.1
Types of Parallel Computers
Two principal approaches:
• Shared memory multiprocessor
• Distributed memory multicomputer
ITCS 4/5145 Parallel Programming, UNC-Charlotte, B. Wilkinson, 2010. Aug 26, 2010
1b.2
Shared Memory
Multiprocessor
1b.3
Conventional Computer
Consists of a processor executing a program stored in a
(main) memory:
Each main memory location located by its address.
Addresses start at 0 and extend to 2b
- 1 when there are
b bits (binary digits) in address.
Main memory
Processor
Instructions (to processor)
Data (to or from processor)
1b.4
Shared Memory Multiprocessor System
Natural way to extend single processor model - have multiple
processors connected to multiple memory modules, such that
each processor can access any memory module:
Processors
Processor-memory
Interconnections
Memory module
One
address
space
1b.5
Simplistic view of a small shared memory
multiprocessor
Examples:
• Dual Pentiums
• Quad Pentiums
Processors Shared memory
Bus
1b.6
Real computer system have cache memory between the main
memory and processors. Level 1 (L1) cache and Level 2 (L2) cache.
Example Quad Shared Memory Multiprocessor
Processor
L2 Cache
Bus interface
L1 cache
Processor
L2 Cache
Bus interface
L1 cache
Processor
L2 Cache
Bus interface
L1 cache
Processor
L2 Cache
Bus interface
L1 cache
Memory controller
Memory
Processor/
memory
bus
Shared memory
1b.7
“Recent” innovation
• Dual-core and multi-core processors
• Two or more independent processors in one
package
• Actually an old idea but not put into wide practice
until recently.
• Since L1 cache is usually inside package and L2
cache outside package, dual-/multi-core processors
usually share L2 cache.
1b.8
Single quad core shared memory
multiprocessor
L2 Cache
Memory controller
Memory
Shared memory
Chip
Processor
L1 cache
Processor
L1 cache
Processor
L1 cache
Processor
L1 cache
1b.9
Examples
• Intel:
– Core Dual processors -- Two processors in one package
sharing a common L2 Cache. 2005-2006
– Intel Core 2 family dual cores, with quad core from Nov
2006 onwards
– Core i7 processors replacing Core 2 family - Quad core
Nov 2008
– Intel Teraflops Research Chip (Polaris), a 3.16 GHz, 80-
core processor prototype.
• Xbox 360 game console -- triple core PowerPC
microprocessor.
• PlayStation 3 Cell processor -- 9 core design.
References and more information -- wikipedia
1b.10
Multiple quad-core multiprocessors
(example coit-grid05.uncc.edu)
Memory controller
Memory
Shared memory
L2 Cache
possible L3 cache
Processor
L1 cache
Processor
L1 cache
Processor
L1 cache
Processor
L1 cache
Processor
L1 cache
Processor
L1 cache
Processor
L1 cache
Processor
L1 cache
1b.11
Programming Shared Memory
Multiprocessors
Several possible ways
1. Thread libraries - programmer decomposes program into
individual parallel sequences, (threads), each being able
to access shared variables declared outside threads.
Example Pthreads
2. Higher level library functions and preprocessor compiler
directives to declare shared variables and specify
parallelism. Uses threads.
Example OpenMP - industry standard. Consists of
library functions, compiler directives, and environment
variables - needs OpenMP compiler
1b.12
3. Use a modified sequential programming language -- added
syntax to declare shared variables and specify parallelism.
Example UPC (Unified Parallel C) - needs a UPC
compiler.
4. Use a specially designed parallel programming language --
with syntax to express parallelism. Compiler automatically
creates executable code for each processor (not now
common).
5. Use a regular sequential programming language such as C
and ask parallelizing compiler to convert it into parallel
executable code. Also not now common.
1b.13
Message-Passing Multicomputer
Complete computers connected through an
interconnection network:
Processor
Interconnection
network
Local
Computers
Messages
memory
1b.14
Interconnection Networks
Many explored in the 1970s and 1980s
• Limited and exhaustive interconnections
• 2- and 3-dimensional meshes
• Hypercube
• Using Switches:
– Crossbar
– Trees
– Multistage interconnection networks
1b.15
Networked Computers as a
Computing Platform
• A network of computers became a very attractive
alternative to expensive supercomputers and
parallel computer systems for high-performance
computing in early 1990s.
• Several early projects. Notable:
– Berkeley NOW (network of workstations)
project.
– NASA Beowulf project.
1b.16
Key advantages:
• Very high performance workstations and PCs
readily available at low cost.
• The latest processors can easily be
incorporated into the system as they become
available.
• Existing software can be used or modified.
1b.17
Beowulf Clusters*
• A group of interconnected “commodity”
computers achieving high performance with
low cost.
• Typically using commodity interconnects -
high speed Ethernet, and Linux OS.
* Beowulf comes from name given by NASA Goddard
Space Flight Center cluster project.
1b.18
Cluster Interconnects
• Originally fast Ethernet on low cost clusters
• Gigabit Ethernet - easy upgrade path
More Specialized/Higher Performance
• Myrinet - 2.4 Gbits/sec - disadvantage: single vendor
• cLan
• SCI (Scalable Coherent Interface)
• QNet
• Infiniband - may be important as infininband
interfaces may be integrated on next generation PCs
1b.19
Dedicated cluster with a master node
and compute nodes
User
Master node
Compute nodes
Dedicated Cluster
Ethernet interface
Switch
External network
Computers
Local network
1b.20
Software Tools for Clusters
• Based upon message passing programming model
• User-level libraries provided for explicitly specifying
messages to be sent between executing processes on
each computer .
• Use with regular programming languages (C, C++, ...).
• Can be quite difficult to program correctly as we shall
see.
Next step
• Learn the message passing
programming model, some MPI
routines, write a message-passing
program and test on the cluster.
1b.21

Paralle programming 2

  • 1.
    1b.1 Types of ParallelComputers Two principal approaches: • Shared memory multiprocessor • Distributed memory multicomputer ITCS 4/5145 Parallel Programming, UNC-Charlotte, B. Wilkinson, 2010. Aug 26, 2010
  • 2.
  • 3.
    1b.3 Conventional Computer Consists ofa processor executing a program stored in a (main) memory: Each main memory location located by its address. Addresses start at 0 and extend to 2b - 1 when there are b bits (binary digits) in address. Main memory Processor Instructions (to processor) Data (to or from processor)
  • 4.
    1b.4 Shared Memory MultiprocessorSystem Natural way to extend single processor model - have multiple processors connected to multiple memory modules, such that each processor can access any memory module: Processors Processor-memory Interconnections Memory module One address space
  • 5.
    1b.5 Simplistic view ofa small shared memory multiprocessor Examples: • Dual Pentiums • Quad Pentiums Processors Shared memory Bus
  • 6.
    1b.6 Real computer systemhave cache memory between the main memory and processors. Level 1 (L1) cache and Level 2 (L2) cache. Example Quad Shared Memory Multiprocessor Processor L2 Cache Bus interface L1 cache Processor L2 Cache Bus interface L1 cache Processor L2 Cache Bus interface L1 cache Processor L2 Cache Bus interface L1 cache Memory controller Memory Processor/ memory bus Shared memory
  • 7.
    1b.7 “Recent” innovation • Dual-coreand multi-core processors • Two or more independent processors in one package • Actually an old idea but not put into wide practice until recently. • Since L1 cache is usually inside package and L2 cache outside package, dual-/multi-core processors usually share L2 cache.
  • 8.
    1b.8 Single quad coreshared memory multiprocessor L2 Cache Memory controller Memory Shared memory Chip Processor L1 cache Processor L1 cache Processor L1 cache Processor L1 cache
  • 9.
    1b.9 Examples • Intel: – CoreDual processors -- Two processors in one package sharing a common L2 Cache. 2005-2006 – Intel Core 2 family dual cores, with quad core from Nov 2006 onwards – Core i7 processors replacing Core 2 family - Quad core Nov 2008 – Intel Teraflops Research Chip (Polaris), a 3.16 GHz, 80- core processor prototype. • Xbox 360 game console -- triple core PowerPC microprocessor. • PlayStation 3 Cell processor -- 9 core design. References and more information -- wikipedia
  • 10.
    1b.10 Multiple quad-core multiprocessors (examplecoit-grid05.uncc.edu) Memory controller Memory Shared memory L2 Cache possible L3 cache Processor L1 cache Processor L1 cache Processor L1 cache Processor L1 cache Processor L1 cache Processor L1 cache Processor L1 cache Processor L1 cache
  • 11.
    1b.11 Programming Shared Memory Multiprocessors Severalpossible ways 1. Thread libraries - programmer decomposes program into individual parallel sequences, (threads), each being able to access shared variables declared outside threads. Example Pthreads 2. Higher level library functions and preprocessor compiler directives to declare shared variables and specify parallelism. Uses threads. Example OpenMP - industry standard. Consists of library functions, compiler directives, and environment variables - needs OpenMP compiler
  • 12.
    1b.12 3. Use amodified sequential programming language -- added syntax to declare shared variables and specify parallelism. Example UPC (Unified Parallel C) - needs a UPC compiler. 4. Use a specially designed parallel programming language -- with syntax to express parallelism. Compiler automatically creates executable code for each processor (not now common). 5. Use a regular sequential programming language such as C and ask parallelizing compiler to convert it into parallel executable code. Also not now common.
  • 13.
    1b.13 Message-Passing Multicomputer Complete computersconnected through an interconnection network: Processor Interconnection network Local Computers Messages memory
  • 14.
    1b.14 Interconnection Networks Many exploredin the 1970s and 1980s • Limited and exhaustive interconnections • 2- and 3-dimensional meshes • Hypercube • Using Switches: – Crossbar – Trees – Multistage interconnection networks
  • 15.
    1b.15 Networked Computers asa Computing Platform • A network of computers became a very attractive alternative to expensive supercomputers and parallel computer systems for high-performance computing in early 1990s. • Several early projects. Notable: – Berkeley NOW (network of workstations) project. – NASA Beowulf project.
  • 16.
    1b.16 Key advantages: • Veryhigh performance workstations and PCs readily available at low cost. • The latest processors can easily be incorporated into the system as they become available. • Existing software can be used or modified.
  • 17.
    1b.17 Beowulf Clusters* • Agroup of interconnected “commodity” computers achieving high performance with low cost. • Typically using commodity interconnects - high speed Ethernet, and Linux OS. * Beowulf comes from name given by NASA Goddard Space Flight Center cluster project.
  • 18.
    1b.18 Cluster Interconnects • Originallyfast Ethernet on low cost clusters • Gigabit Ethernet - easy upgrade path More Specialized/Higher Performance • Myrinet - 2.4 Gbits/sec - disadvantage: single vendor • cLan • SCI (Scalable Coherent Interface) • QNet • Infiniband - may be important as infininband interfaces may be integrated on next generation PCs
  • 19.
    1b.19 Dedicated cluster witha master node and compute nodes User Master node Compute nodes Dedicated Cluster Ethernet interface Switch External network Computers Local network
  • 20.
    1b.20 Software Tools forClusters • Based upon message passing programming model • User-level libraries provided for explicitly specifying messages to be sent between executing processes on each computer . • Use with regular programming languages (C, C++, ...). • Can be quite difficult to program correctly as we shall see.
  • 21.
    Next step • Learnthe message passing programming model, some MPI routines, write a message-passing program and test on the cluster. 1b.21