ceg4131_models.ppthjjjjjjjhhjhjhjhjhjhjhj

1
Parallel Computer Models
CEG 4131 Computer Architecture III
Miodrag Bolic

2
Overview
• Flynn’s taxonomy
• Classification based on the memory arrangement
• Classification based on communication
• Classification based on the kind of parallelism
– Data-parallel
– Function-parallel

3
Flynn’s Taxonomy
– The most universally excepted method of classifying computer
systems
– Published in the Proceedings of the IEEE in 1966
– Any computer can be placed in one of 4 broad categories
» SISD: Single instruction stream, single data stream
» SIMD: Single instruction stream, multiple data streams
» MIMD: Multiple instruction streams, multiple data streams
» MISD: Multiple instruction streams, single data stream

4
SISD
Processing
element (PE)
Main memory
(M)
Instructions
Data
Control Unit PE Memory
PE
IS
IS DS

5
Applications:
• Image processing
• Matrix manipulations
• Sorting
SIMD

6
SIMD Architectures
• Fine-grained
– Image processing application
– Large number of PEs
– Minimum complexity PEs
– Programming language is a simple extension of a sequential
language
• Coarse-grained
– Each PE is of higher complexity and it is usually built with
commercial devices
– Each PE has local memory

8
MISD
Applications:
• Classification
• Robot vision

9
Flynn taxonomy
– Advantages of Flynn
» Universally accepted
» Compact Notation
» Easy to classify a system (?)
– Disadvantages of Flynn
» Very coarse-grain differentiation among machine
systems
» Comparison of different systems is limited
» Interconnections, I/O, memory not considered in the
scheme

10
Classification based on memory arrangement
PE1 PEn
Processors
Interconnection
network
Shared memory
Shared memory - multiprocessors
I/O1
I/On
PE1
Interconnection
network
M1
P1
PEn
Mn
Pn
Message passing - multicomputers

11
Shared-memory multiprocessors
• Uniform Memory Access (UMA)
• Non-Uniform Memory Access (NUMA)
• Cache-only Memory Architecture (COMA)
• Memory is common to all the processors.
• Processors easily communicate by means of shared
variables.

12
The UMA Model
• Tightly-coupled systems (high degree of resource
sharing)
• Suitable for general-purpose and time-sharing
applications by multiple users.
P1
$
Interconnection network
$
Pn
Mem Mem

13
Symmetric and asymmetric multiprocessors
• Symmetric:
- all processors have equal access to all peripheral
devices.
- all processors are identical.
• Asymmetric:
- one processor (master) executes the operating
system
- other processors may be of different types and may
be dedicated to special tasks.

14
The NUMA Model
• The access time varies with the location of the memory word.
• Shared memory is distributed to local memories.
• All local memories form a global address space accessible by
all processors
P1
$
Interconnection network
$
Pn
Mem Mem
Distributed Memory (NUMA)
Access time: Cache, Local memory, Remote memory
COMA - Cache-only Memory Architecture

15
Distributed memory multicomputers
• Multiple computers- nodes
• Message-passing network
• Local memories are private with its own
program and data
• No memory contention so that the
number of processors is very large
• The processors are connected by
communication lines, and the precise
way in which the lines are connected is
called the topology of the multicomputer.
• A typical program consists of subtasks
residing in all the memories. PE
Interconnection
network
M
PE
M
PE
M
PE
M
PE
M
PE
M

16
Classification based on type of
interconnections
• Static networks
• Dynamic networks

17
Interconnection Network [1]
• Mode of Operation (Synchronous vs. Asynchronous)
• Control Strategy (Centralized vs. Decentralized)
• Switching Techniques (Packet switching vs. Circuit
switching)
• Topology (Static Vs. Dynamic)

18
Classification based on the kind of
parallelism[3]
Parallel
architectures
PAs
Data-parallel architectures Function-parallel architectures
Instruction-level
PAs
Thread-level
PAs
Process-level
PAs
ILPS MIMDs
Vector
architecture
Associative
architecture
architecture
and neural
SIMDs Systolic Pipelined
processors
Processors)
processors
VLIWs Superscalar Ditributed
memory
(multi-computer)
Shared
memory
(multi-
MIMD
DPs

19
References
1. Advanced Computer Architecture and Parallel
Processing, by Hesham El-Rewini and Mostafa Abd-El-
Barr, John Wiley and Sons, 2005.
2. Advanced Computer Architecture Parallelism,
Scalability, Programmability, by K. Hwang, McGraw-Hill
1993.
3. Advanced Computer Architectures – A Design Space
Approach by Desco Sima, Terence Fountain and Peter
Kascuk, Pearson, 1997.

20
Speedup
• S = Speed(new) / Speed(old)
• S = Work/time(new) / Work/time(old)
• S = time(old) / time(new)
• S = time(before improvement) /
time(after improvement)

21
Speedup
• Time (one CPU): T(1)
• Time (n CPUs): T(n)
• Speedup: S
• S = T(1)/T(n)

22
Amdahl’s Law
The performance improvement to be gained from using
some faster mode of execution is limited by the fraction
of the time the faster mode can be used

23
20 hours
200 miles
A B
Walk 4 miles /hour
Bike 10 miles / hour
Car-1 50 miles / hour
Car-2 120 miles / hour
Car-3 600 miles /hour
must walk
Example

24
20 hours
200 miles
A B
Walk 4 miles /hour  50 + 20 = 70 hours S = 1
Bike 10 miles / hour  20 + 20 = 40 hours S = 1.8
Car-1 50 miles / hour  4 + 20 = 24 hours S = 2.9
Car-2 120 miles / hour  1.67 + 20 = 21.67 hours S = 3.2
Car-3 600 miles /hour  0.33 + 20 = 20.33 hours S = 3.4
must walk
Example

25
Amdahl’s Law (1967)
  : The fraction of the program that is naturally serial
• (1- ): The fraction of the program that is naturally
parallel

26
S = T(1)/T(N)
T(N) = T(1) +
T(1)(1-  )
N
S =
1
 + (1-  )
N
=
N
N + (1-  )

ceg4131_models.ppthjjjjjjjhhjhjhjhjhjhjhj

More Related Content

Similar to ceg4131_models.ppthjjjjjjjhhjhjhjhjhjhjhj

Recently uploaded

ceg4131_models.ppthjjjjjjjhhjhjhjhjhjhjhj

Editor's Notes