Chap 2 classification of parralel architecture and introduction to parllel program. models

UBa/NAHPI-2020
DepartmentofComputer
Engineering
PARALLEL AND DISTRIBUTED
COMPUTING
By
Malobe LOTTIN Cyrille .M
Network and Telecoms Engineer
PhD Student- ICT–U USA/CAMEROON
Contact
Email:malobecyrille.marcel@ictuniversity.org
Phone:243004411/695654002

CHAPTER 2
Parallel and Distributed Computer
Architectures, Performance Metrics
And Parallel Programming Models
Previous … Chap 1: General Introduction (Parallel and Distributed Computing)

CONTENTS
• INTRODUCTION
• Why parallel Architecture ?
• Modern Classification of Parallel Computers
• Structural Classification of Parallel Computers
• Parallel Computers Memory Architectures
• Hardware Classification
• Performance of Parallel Computers architectures
- Peak and Sustained Performance
• Measuring Performance of Parallel Computers
• Other Common Benchmarks
• Parallel Programming Models
- Shared Memory Programming Model
- Thread Model
- Distributed Memory
- Data Parallel
- SPMD/MPMD
• Conclusion
Exercises ( Check your Progress, Further Reading and Evaluation)

Previously on Chap 1
 Part 1- Introducing Parallel and Distributed Computing
• Background Review of Parallel and Distributed Computing
• INTRODUCTION TO PARALLEL AND DISTRIBUTED COMPUTING
• Some keys terminologies
• Why parallel Computing?
• Parallel Computing: the Facts
• Basic Design Computer Architecture: the von Neumann Architecture
• Classification of Parallel Computers (SISD,SIMD,MISD,MIMD)
• Assignment 1a
 Part 2- Initiation to Parallel Programming Principles
• High Performance Computing (HPC)
• Speed: a need to solve Complexity
• Some Case Studies Showing the need of Parallel Computing
• Challenge of explicit Parallelism
• General Structure of Parallel Programs
• Introduction to the Amdahl's LAW
• The GUSTAFSON’s LAW
• SCALIBILITY
• Fixed Size Versus Scale Size
• Assignment 1b
• Conclusion

INTRODUCTION
• Parallel Computer Architecture is the method that consist of
Maximizing and organizing computer resources to achieve Maximum
performance.
- Performance at any instance of time, is achievable within the limit given
by the technology.
- The same system may be characterized both as "parallel" and
"distributed"; the processors in a typical distributed system run
concurrently in parallel.
• The use of more processors to compute tasks simultaneously
contribute in providing more features to computers systems.
• In the Parallel architecture, Processors during computation may have
access to a shared memory to exchange information between them.
•
imagesSource:Wikipedia,DistributingComputing,2020

• In a Distributed architecture, each processor during computation,
make use of its own private memory (distributed memory). In this
case, Information is exchanged by passing messages between the
processors.
• Significant characteristics of distributed systems are: concurrency of
components, lack of a global clock (Clock synchronization) , and
independent failure of components.
• The use of distributed systems to solve computational problems is
Called Distributed Computing (Divide problem into many tasks, each task is handle by one or
more computers, which communicate with each other via message passing).
• High-performance parallel computation operating shared-memory
multiprocessor uses parallel algorithms while the coordination of
a large-scale distributed system uses distributed algorithms.
INTRODUCTION
imagesSource:Wikipedia,DistributingComputing,2020

• Parallelism is nowadays in all levels of computer architectures.
• It is the Enhancements of Processors that justify the success in the
development of Parallelism.
• Today, they are superscalar (Execute several instructions in parallel each clock cycle).
- besides, The advancement of the underlying Very Large-Scale Integration (VLSI )technology,
which allows larger and larger numbers of components to fit on a chip and clock rates to increase.
• Three main elements define structure and performance of Multiprocessor:
- Processors
- Memory Hierarchies (registers, cache, main memory, magnetic discs, magnetic tapes)
- Interconnection Network
• But, the gap of performance between the processor and the memory is still
increasing ….
• Parallelism is used by computer architecture to translate the raw potential of
the technology into greater performance and expanded capability of the
computer system
• Diversity in parallel computer architecture makes the field challenging to learn
and challenging to present.
INTRODUCTION ( Cont…)

Remember that:
A parallel computer is a collection of processing elements that
cooperate and communicate to solve large problems fast.
• The attempt to solve this large problems raises some fundamental
questions which the answer can only by satisfy by understanding:
- Various components of Parallel and Distributed systems( Design
and operation),
- How much problems a given Parallel and Distributed system can
solve,
- How processors corporate, communicate / transmit data between
them,
- The primitive abstractions that the hardware and software provide
to the programmer for better control,
- And, How to ensure a proper translation to performance once these
elements are under control.
INTRODUCTION (Cont…)

Why Parallel Architecture ?
• No matter the performance of a single processor at a given time, we can
achieve in principle higher performance by utilizing many such processors
so far we are ready to pay the price (Cost).
Parallel Architecture is needed To:
 Respond to Applications Trends
• Advances in hardware capability enable new application functionality 
drives parallel architecture harder, since parallel architecture focuses on the
most demanding of these applications.
• At the Low end level, we have the largest volume of machines and greatest
number of users; at the High end, most demanding applications.
• Consequence: pressure for increased performance  most demanding
applications must be written as parallel programs to respond to this
demand generated from the High end
 Satisfy the need of High Computing in the field of computational science
and engineering
- A response to simulate physical phenomena impossible or very
costly to observe through empirical means (modeling global climate change
over long periods, the evolution of galaxies, the atomic structure of materials,
etc…)

 Respond to Technology Trends
• Can’t “wait for the single processor to get fast enough ”
Respond to Architectural Trends
• Advances in technology determine what is possible; architecture
translates the potential of the technology into performance and
capability .
• Four generation of Computer architectures (tubes, transistors,
integrated circuits, and VLSI ) where strong distinction is function of
the type of parallelism implemented ( Bit level parallelism  4-bits
to 64 bits, 128 bits is the future).
• There has been tremendous architectural advances over this period
: Bit level parallelism, Instruction level Parallelism, Thread Level
Parallelism
All these forces driving the development of parallel architectures are
resumed under one main quest: Achieve absolute maximum
performance ( Supercomputing)
Why Parallel Architecture ? (Cont …)

Modernclassification
Accordingto(Sima,Fountain,Kacsuk)
Before modern classification,
Recall Flynn’s taxonomy classification of Computers
- based on the number of instructions that can be executed and how they operate on data.
Four Main Type:
• SISD: traditional sequential architecture
• SIMD: processor arrays, vector processor
• Parallel computing on a budget – reduced control unit cost
• Many early supercomputers
• MIMD: most general purpose parallel computer today
• Clusters, MPP, data centers
• MISD: not a general purpose architecture
Note: Globally four type of parallelism are implemented:
- Bit Level Parallelism: performance of processors based on word size ( bits)
- Instruction Level Parallelism: give ability to processors to execute more than instruction
per clock cycle
- Task Parallelism: characterize Parallel programs
- Superword Level Parallelism: Based on vectorization Techniques
Computer Architectures
SISD SIMD MIMD MISD

• Classification here is based on how parallelism is achieved
• by operating on multiple data: Data parallelism
• by performing many functions in parallel: Task parallelism (function)
• Control parallelism, task parallelism depending on the level of the functional
parallelism.
ModernClassification
Accordingto(Sima,Fountain,Kacsuk)
Parallel architectures
Data-parallel
architectures
Function-parallel
architectures
- Different operations are
performed on the same or
different data
- Asynchronous computation
- Speedup is less as each
processor will execute a different
thread or process on the same or
different set of data.
- Amount of parallelization is
proportional to the number of
independent tasks to be
performed
- Load balancing depends on the
availability of the hardware and
scheduling algorithms like static
and dynamic scheduling.
- Applicability : pipelining
- Same operations are
performed on different
subsets of same data
- Synchronous computation
- Speedup is more as there is
only one execution thread
operating on all sets of data.
- Amount of parallelization is
proportional to the input data
size
- Designed for optimum load
balance on multi processor
system
Applicability: Arrays, Matrix

• Flynn’s classification Focus on the behavioral aspect of computers .
• Looking at the structure, Parallel computers can be classified based on a focus on
how processors communicate with the memory.
 When multiprocessors communicate through the global shared memory modules
then this organization is called Shared memory computer or Tightly
 when every processor in a multiprocessor system, has its own local memory and
the processors communicate via messages transmitted between their local memories,
then this organization is called Distributed memory computer or Loosely coupled system
StructuralClassificationof ParallelComputers

Parallel Computer Memory Architectures
Shared Memory Parallel Computers architecture
- Processors can access all memory as global
address space
- Multi-processors can operate independently but
share the same memory resources
- Changes in a memory location effected by one
processor are visible to all other processors
Based on memory access time, we can
classify Shared memory Parallel Computers into
two:
 Uniform Memory Access (UMA)
 Non-Uniform Memory Access (NUMA)

ParallelComputerMemoryArchitectures(Cont…)
 Uniform Memory Access (UMA) (known as Cache Coherent -
UMA)
• Commonly represented today by Symmetric
Multiprocessor (SMP) machines
• Identical processors
• Equal access and access times to memory
Note: Cache coherent is a hardware operation where any update of a
location in shared memory by one processor , is announce to all the
other processors .
Source:Imagesretrievedfromhttps://computing.llnl.gov/tutorials/parallel_comp/#SharedMemory

Non-Uniform Memory Access (NUMA)
• The architecture often link two or more SMPs
In such that :
- One SMP can directly access memory of another SMP
- Not all processors have equal access time to all memories
- Memory access across link is slower
Note: if Cache coherent is implemented, then we can also call it
Cache Coherent NUMA
• The proximity of memory to CPUs on Shared Memory parallel computer
makes Data sharing between tasks fast and uniform.
• But, there is a lack of scalability between memory and CPUs.
Source:Imagesretrievedfromhttps://computing.llnl.gov/tutorials/parallel_comp/#SharedMemory
BruceJacob,...DavidT.Wang,inMemorySystems,2008

 Distributed Memory Parallel Computer Architecture
• Different varieties as Shared Memory Computer.
• Require a communication network to connect inter-processor memory.
- Each processor operates independently with its own local memory
- individual processors changes does not affect the memory of other
processors.
- Cache Coherency does not apply here !
• Access to data in another processor is usually the task of the
programmer(explicitly define how and when data is communicated)
• This architecture is cost effective (can use commodity, off-the-shelf
processors and networking).
• But, the responsibility of the programmer is more engage for data
communication between processors
Source:Retrievedfrom
https://www.futurelearn.com/courses/supercomputing/0/steps/24022

Source:NikolaosPloskas,NikolaosSamaras,inGPUProgramminginMATLAB,2016
Overview of Parallel Memory Architecture
Note: - The largest and fastest computers in the world today employ both
shared and distributed memory architectures (Hybrid Memory)
- In hybrid design, Shared memory component here can be a shared
memory machine and/or graphics processing units (GPU)
- And, Distributed memory component is the networking of multiple
shared memory/GPU machines
- This type of memory architecture will continue to prevail and increase

• Parallel computers can be roughly classified according to the level
at which the hardware in the parallel architecture supports
parallelism.
 Multicore Computing
Symmetric multiprocessing ( tightly coupled multiprocessing)
Hardwareclassification
- Made of computer system with multiple
identical processors that share memory
and connect via a bus
- Do not comprise more than 32 processors
to minimize bus contention
- Symmetric multiprocessors are extremely
cost-effective
retrievedfromhttps://en.wikipedia.org/wiki/Parallel_computing#Bit-
level_parallelism,2020
- Processor includes multiple processing units (called "cores") on the
same chip.
- issue multiple instructions per clock cycle from multiple instruction
streams
- Differs from a superscalar processor. But, Each core in a multi-core
processor can potentially be superscalar as well.
Superscalar: issue multiple instructions per clock cycle from one instruction stream
(thread).
- Example: IBM's Cell microprocessor in Sony PlayStation 3

 Distributed Computing (distributed memory multiprocessor)
Cluster Computing
Hardwareclassification(Cont…)
• Not to be confused with Decentralized computing
- Allocation of resources (Hardware + software) to individual
workstations
• components are located on different networked computers,
which communicate and coordinate their actions by passing
messages to one another
• Interaction of components is done to achieve a common goal
• Characterize by concurrency of components, lack of a global
clock, and independent failure of components.
• can include heterogeneous computations where some nodes
may perform a lot more computation, some perform very
little computation and a few others may perform specialized
functionality
• Example: Multiplayer Online game
• loosely coupled computers that work together closely
• in some respects they can be regarded as a single computer
• multiple standalone machines constitute a cluster and
connected by a network.
• computer clusters have each node set to perform the same
task, controlled and scheduled by software.
• Computer clustering relies on a centralized management
approach which makes the nodes available as orchestrated
shared servers.
• Example: IBM's Sequoia
Sources:DinkarSitaram,GeethaManjunath,inMovingToTheCloud,2012
CiscoSystems,2003

Performance of parallel architectures
 Various ways to measure the performance of a parallel algorithm running
on a parallel processor.
 Most commonly used measurements:
- speed-up
- Efficiency/ Isoefficiency
- Elapsed time (Very important factor Elapsed time for a program divided by the cost of the machine that ran the job.
- price/performance
Note: none of these metrics should be used independent of the run time of the parallel system
 Common metrics of Performance
• FLOPS and MIPS are units of measure for the numerical computing performance of a
computer
• Distributed computing uses the Internet to link personal computers to achieve more
FLOPS
- MIPS: million instructions per second
MIPS = instruction count/(execution time x 106)
- MFLOPS: million floating point operations per second.
FLOPS = FP ops in program/(execution time x 106)
• Which of the metric is better?
• FLOP is more related to the time of a task in numerical code.
# of FLOP / program is determined by the matrix size
See Chapter 1

“In June 2020, Fugaku turned in a High Performance Linpack (HPL) result
of 415.5 petaFLOPS, besting the now second-place Summit system by a
factor of 2.8x. Fugaku is powered by Fujitsu’s 48-core A64FX SoC,
becoming the first number one system on the list to be powered by ARM
processors. In single or further reduced precision, used in machine learning
and AI applications, Fugaku’s peak performance is over 1,000 petaflops (1
exaflops). The new system is installed at RIKEN Center for Computational
Science (R-CCS) in Kobe, Japan ” (wikipedia Flops, 2020).
Performance of parallel architectures
Here we are !
Single CPU Performance
The future

Peak and sustained performance
Peak performance
• Measured in MFLOPS
• Highest possible MFLOPS when the system does nothing but
numerical computation
• Rough hardware measure
• Little indication on how the system will perform in practice.
Peak Theoretical Performance
• Node performance in GFlops = (CPU speed in GHz) x (number of
CPU cores) x (CPU instruction per cycle) x (number of CPUs per
node)

Peak and sustained performance
• Sustained performance
• The MFLOPS rate that a program achieves over the entire run.
• Measuring sustained performance
• Using benchmarks
• Peak MFLOPS is usually much larger than sustained MFLOPS
• Efficiency rate = sustained MFLOPS / peak MFLOPS

Measuring the performance of
parallel computers
• Benchmarks: programs that are used to measure the
performance.
• LINPACK benchmark: a measure of a system’s floating point
computing power
• Solving a dense N by N system of linear equations Ax=b
• Use to rank supercomputers in the top500 list.
No. 1 since June 2020
Fugaku, is powered by Fujitsu’s 48-core A64FX SoC, becoming the first
number one system on the list to be powered by ARM processors.

Other common benchmarks
• Micro benchmarks suit
• Numerical computing
• LAPACK
• ScaLAPACK
• Memory bandwidth
• STREAM
• Kernel benchmarks
• NPB (NAS parallel benchmark)
• PARKBENCH
• SPEC
• Splash

PARALLEL PROGRAMMING MODELS
A programming perspective of Parallelism implementation in parallel
and distributed Computer architectures

Parallel Programming Models
Parallel programming models exist as an abstraction above hardware
and memory architectures.
 There are commonly several parallel programming models used
• Shared Memory (without threads)
• Threads
• Distributed Memory / Message Passing
• Data Parallel
• Hybrid
• Single Program Multiple Data (SPMD)
• Multiple Program Multiple Data (MPMD)
 These models are NOT specific to a particular type of machine or
memory architecture (a given model can be implemented on any
underlying hardware).
Example: - SHARED memory model on a DISTRIBUTED memory
machine ( Machine memory is physically distributed across networked
machines, but at the user level as a single shared memory global address
space --- Kendall Square Research (KSR) ALLCACHE---

Which Model to USE ??
There is no "best" model
However, there are certainly better implementations of some models over others
Parallel Programming Models

SharedMemoryProgramming Model
(WithoutThread)
• A thread is the basic unit to which the operating system allocates
processor time. They are smallest sequence of programmed
instructions
• In a Share Memory programming model,
- Processes/tasks share a common address space, which they
read and write to asynchronously.
- Make use of mechanisms such as locks / semaphores to control
access to the shared memory, resolve contentions and to prevent race
conditions and deadlocks.
• This may be consider as the simplest parallel programming model

• Note: Locks, Mutexe and semaphore are type of
synchronization objects in a share resources
environment. Abstract concepts.
-Locks protects access to some kind of shared resource, and give
right to access the protected share resource when owned.
Example, if you have a lockable object ABC you may:
- acquire the lock on ABC,
- take the lock on ABC,
- lock ABC,
- take ownership of ABC, or relinquish ownership of ABC if not needed
- Mutexe (Mutual EXclusion): lockable object that can be owned by
exactly one thread at a time
• Example: in C++, std::mutex, std::timed_mutex, std::recursive_mutex
-- Semaphore: A semaphore is a very relaxed type of lockable object,
with a predefined maximum count, and a current count.
Shared MemoryProgramming Model(Cont..)

Advantages Disadvantages
• No need to specify explicitly the
communication of data between tasks,
so no need to implement “ownership”.
Very advantageous for a Programmer
It becomes more difficult to understand
and manage data locality.
• All processes see and have equal access
to shared memory
There is Conservation of memory access,
cache refresh and bus traffic when keeping
data local to a given process
• Open for simplification during the
development of the program
controlling data locality is hard to
understand and may be beyond the control
of the average user.
Shared MemoryProgramming Model(Cont..)
During Implementation,
• Case: stand-alone shared memory machines
- native operating systems, compilers and/or hardware provide support for
shared memory programming. E.g. POSIX standard provides an API for using shared memory.
• Case: distributed memory machines:
- memory is physically distributed across a network of machines, but made
global through specialized hardware and software

• This is a type of shared memory programming.
• Here, a single "heavy weight" process can have multiple "light weight",
concurrent execution paths.
• To understand this model, let us consider the execution of a main
program a.out , scheduled to run by the native operating system.
Thread Model
 a.out start by loading and acquiring all of the necessary system and user resources
to run. This constitute the "heavy weight" process
 a.out performs some serial work, and then creates a number of tasks (threads) that
can be scheduled and run by the operating system concurrently
 Each thread has local data, but also, shares the entire resources of a.out “Light
weight” and benefit from a global memory view because it shares the memory
space of a.out
 Need for synchronization coordination to ensure that more than one thread is not
updating the same global address at any time.

• During Implementation, threads implementations commonly comprise:
 A library of subroutines that are called from within parallel source code
 A set of compiler directives imbedded in either serial or parallel source
code.
Note: Often , the programmer is responsible for determining the parallelism.
• Unrelated standardization efforts have resulted in two very different
implementations of threads:
- POSIX Threads
* Specified by the IEEE POSIX 1003.1c standard (1995). C Language only, Part of Unix/Linux operating systems and
Very explicit parallelism--requires significant programmer attention to detail.
- OpenMP ( Used for Tutorial in the context of this course).
* Industry standard, Compiler directive based Portable / multi-platform, including Unix and Windows
platforms, available in C/C++ and Fortran implementations, Can be very easy and simple to use - provides for
"incremental parallelism". Can begin with serial code.
Others include: - Microsoft threads
- Java, Python threads
- CUDA threads for GPUs
Thread Model (Cont…)

• In this Model,
A set of tasks uses their own local memory during computation
Multiple tasks can reside on the same physical machine and/or across an arbitrary
number of machines.
Exchange of data by tasks is done through communication( sending/ receiving
messages).
But, there must be a certain Process Cooperation during data transfer.
During Implementation,
• The programmer is responsible for determining all parallelism
• Message passing implementations usually comprise a library of subroutines that
are imbedded in source code.
• MPI is the "de facto" industry standard for message passing.
- Message Passing Interface (MPI), specification available at http://www.mpi-
forum.org/docs/.
DistributedMemory/MessagePassingModel

Can also be referred to as the Partitioned Global Address Space (PGAS) model.
Here,
 Address space is treated globally
 Most of the parallel work focuses on performing operations on a data set
typically organized into a common structure, such as an array or cube
 A set of tasks work collectively on the same data structure, however, each task
works on a different partition of the same data structure.
 Tasks perform the same operation on their partition of work, for example, "add 4
to every array element“
 Can be implemented on share memory (data structure is accessed through
global memory) and distributed memory architectures (Global Data structure
can be logically/Physical split across tasks).
Data Parallel Model

For the Implementation,
• Various popular, and sometimes developmental parallel
programming based on the Data Parallel / PGAS model.
• - Coarray Fortran, compiler dependent
* further reading (https://en.wikipedia.org/wiki/Coarray_Fortran)
• - Unified Parallel C (UPC), extension to the C programming
language for SPMD parallel programming.
* further reading http://upc.lbl.gov/
- Global Arrays , shared memory style programming environment in the context of
distributed array data structures.
* Further reading on https://en.wikipedia.org/wiki/Global_Arrays
Data Parallel Model ( Cont…)

Single Program Multiple Data (SPMD) Multiple Program Multiple Data (MPMD)
"high level" programming model (Can be build based on any parallel programming
model)
Why SINGLE PROGRAM ?
All tasks execute their copy of the same
program (threads, message passing, data
parallel or hybrid) simultaneously
Why MULTIPLE PROGRAM ?
Tasks may execute different programs
(threads, message passing, data parallel or
hybrid) simultaneously
Why MULTIPLE DATA ?
All tasks may use different data
Why MULTIPLE Data ?
All tasks may use different data
Intelligent Enough: tasks do not necessarily
have to execute the entire program.
Not intelligent enough has SPMD.
But, may be better suited for certain types
of problems (functional decomposition
problems)
Single ProgramMultipleData (SPMD)/
MultipleProgram MultipleData (MPMD)

Conclusion
• Parallel computer architectures contribute in achieving maximum performance within the limit
given by the technology.
• Diversity in parallel computer architecture makes the field challenging to learn and challenging to
present
• Classification can be based on the number of instructions that can be executed and how they
operate on data- Flynn (SISD,SIMD,MISD,MIMD)
• Also, classification can be based on how parallelism is achieved (Data parallel architectures,
Function-parallel architectures)
• Classification can as well be focus on how processors communicate with the memory (Shared
memory computer or Tightly , Distributed memory computer or Loosely coupled system)
• There must be a way to appreciate the performance of the parallelize architecture
• FLOPS and MIPS are units of measure for the numerical computing performance of a computer.
• Parallelism is made possible with implementation of adequate parallel programming models.
• The most simple model appears to be the Shared Memory Programming Model.
• The SPMD and MPMD programming required mastering of the previous programming model for
Proper implementation.
• How do we then design a Parallel Program for effective parallelism?
See Next Chapter: Designing Parallel Programs and understanding notion of
Concurrency and Decomposition.

Challenge your understanding
1- What difference do you make between Parallel computer and Parallel Computing ?
2- What do you understand by True data dependency and Resource dependency?
3- Illustrate the notion of Vertical Waste and Horizontal Waste.
4- According to you, which of the design architecture can provide better performance ?. Use
performance metrics to justify your arguments.
6- what is Concurrent-read, concurrent-write (CRCW) PRAM
5-
On this Figure, we have an illustration of a Bus-based interconnects (a) with no local caches and (b)
Bus-based interconnects with local memory/caches.
Explain the difference focusing on :
- The design architecture
- The operation
- The Pros and Cons
6- Discuss on the HANDLER’S CLASSIFICATION Computers architectures compares to Flynn and others classifications .

Class Work Group and Presentation
• Purpose: Demonstrate Condition to detect eventual
Parallelism.
“Parallel computing requires that the segments to be executed
in parallel must be independent of each other. So, before
executing parallelism, all the conditions of parallelism between
the segments must be analyzed”.
Use Bernstein Conditions for Detection of Parallelism to demonstrate when
instructions i1, i2,….,in can be said “ Parallelized”.

REFERENCES
1. Xin Yuan, CIS4930/CDA5125: Parrallel and Distributed Systems,
Retrieve from http://www.cs.fsu.edu/~xyuan/cda5125/index.html
2. EECC722 – Shaaban, #1 lec # 3 Fall 2000 9-18-2000
3. Blaise Barney, Lawrence Livermore National Laboratory,
https://computing.llnl.gov/tutorials/parallel_comp/#ModelsOverv
iew, Last Modified: 11/02/2020 16:39:01
4. J BlazeWich et al, Handbook on Parallel and distributed
Processing, International Handbook of Information Systems,
spinger, 2000
5. Phillip J. windley, Parallel Architectures, lesson 6, CS462, Large
scale Distributed Systems, 2020
6. A. Grana, et al. Introduction to Parallel Computing, lecture 3

Chap 2 classification of parralel architecture and introduction to parllel program. models

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Chap 2 classification of parralel architecture and introduction to parllel program. models

Similar to Chap 2 classification of parralel architecture and introduction to parllel program. models (20)

Recently uploaded

Recently uploaded (20)

Chap 2 classification of parralel architecture and introduction to parllel program. models