SlideShare a Scribd company logo
What is Parallel Computing
•Traditionally, software has been written for
serial computation:
–To be run on a single computer having a single
Central Processing Unit (CPU);
–A problem is broken into a discrete series of
instructions.
–Instructions are executed one after another.
–Only one instruction may execute at any moment
in time.
•Parallel computing is the simultaneous use of
multiple compute resources to solve a
computational problem.
–To be run using multiple CPUs
–A problem is broken into discrete parts that can
be solved concurrently
–Each part is further broken down to a series of
instructions
•Instructions from each part execute simultaneously
on different CPUs
Why Parallel Computing?
•The primary reasons for using parallel
computing:
–Save time
–Solve large problems
–Provide concurrency (do multiple things at the
same time)
–Taking advantage of non-local resources
–Overcoming memory constraints
–Cost savings
Basic Design
•Basic design
–Memory is used to store both program and data
instructions
–Program instructions are coded data which tell
the computer to do something
–Data is simply information to be used by the
program
•A central processing unit (CPU) gets
instructions and/or data from memory,
decodes the instructions and then
sequentially performs them.
9
Parallel Computer Models
10
Classification of parallel architectures
•Flynn’s taxonomy
•Classification based on the memory
arrangement
•Classification based on type of interconnection
11
Flynn’s Taxonomy
– The most universally accepted method of
classifying computer systems
– Any computer can be placed in one of 4 broad
categories
» SISD: Single instruction stream, single data
stream
» SIMD: Single instruction stream, multiple data
streams
» MIMD: Multiple instruction streams, multiple
data streams
12
SISD
Processing
element (PE)
Main memory
(M)
Instructions
Data
Control Unit PE Memory
PE
IS
IS DS
Single Instruction, Single Data
(SISD)
•A serial (non-parallel) computer
•Single instruction: only one instruction
stream is being acted on by the CPU during
any one clock cycle
•Single data: only one data stream is being
used as input during any one clock cycle
•This is the oldest and until recently, the most
prevalent form of computer
•Examples: most PCs, single CPU workstations
and mainframes
14
SIMD
•A type of parallel computer
•Single instruction: All processing units execute the same instruction at any
given clock cycle
•Multiple data: Each processing unit can operate on a different data element
•Best suited for specialized problems characterized by a high degree of
regularity such as image processing.
•Two varieties: Processor Arrays and Vector Pipelines
•Examples:
–Processor Arrays: Connection Machine CM-2, Maspar MP-1, MP-2
–Vector Pipelines: IBM 9000, Cray C90, Fujitsu VP, NEC SX-2, Hitachi S820
Applications
•Image processing
• Matrix manipulations
• Sorting
17
MISD
• A single data stream is fed into multiple processing
units.
• Each processing unit operates on the data
independently via independent instruction streams.
• Many functional units perform different operations
on the same data.
Applications
•Classification
• Robot vision
20
MIMD
•Currently, the most common type of parallel
computer. Most modern computers fall into this
category.
•Multiple Instruction: every processor may be
executing a different instruction stream
•Multiple Data: every processor may be working
with a different data stream
•Execution can be synchronous or asynchronous,
deterministic or non-deterministic
Classification based on memory
arrangement
23
PE1 PEn
Processors
Interconnection
network
Shared memory
Shared memory - multiprocessors
I/O1
I/On
PE1
Interconnection
network
M1
P1
PEn
Mn
Pn
Message passing - multicomputers
•Multiple processors can operate independently but share
the same memory resources.
•Changes in a memory location effected by one processor
are visible to all other processors.
•Processors easily communicate by means of shared
variables
•Shared memory machines can be divided into two main
classes based upon memory access times: UMA and NUMA
Shared Memory: Pro and Con
•Advantages
–Global address space provides a user-friendly programming
perspective to memory
–Data sharing between tasks is both fast and uniform due to
the proximity of memory to CPUs
•Disadvantages:
–Primary disadvantage is the lack of scalability between
memory and CPUs.
– Adding more CPUs can geometrically increases traffic on
the shared memory
–Programmer responsibility for synchronization constructs
that insure "correct" access of global memory.
–Expense: it becomes increasingly difficult and expensive to
design and produce shared memory machines with ever
increasing numbers of processors.
28
Distributed memory multicomputers
PE
Interconnection
network
M
PE
M
PE
M
PE
M
PE
M
PE
M
•Processors have their own local memory.
•Memory addresses in one processor do not map to another
processor, so there is no concept of global address space across
all processors.
•Because each processor has its own local memory, it operates
independently.
•Changes it makes to its local memory have no effect on the
memory of other processors. Hence, the concept of cache
coherency does not apply.
•When a processor needs access to data in another processor, it
is usually the task of the programmer to explicitly define how
and when data is communicated. Synchronization between tasks
is likewise the programmer's responsibility
Distributed Memory: Pro and Con
•Advantages
–Memory is scalable with number of processors. Increase the number of
processors and the size of memory increases proportionately.
–Each processor can rapidly access its own memory without interference
and without the overhead incurred with trying to maintain cache
coherency.
–Cost effectiveness: can use commodity, off-the-shelf processors and
networking.
•Disadvantages
–The programmer is responsible for many of the details associated with
data communication between processors.
–It may be difficult to map existing data structures, based on global
memory, to this memory organization.
–Non-uniform memory access (NUMA) times
Classification based on type of
interconnections
33
•Static networks
•Dynamic networks
Scalar and vector processors
•Scalar processors are the most basic type of
processor.
•These process one item at a time, typically
integers or floating point numbers, which are
numbers too large or small to be represented by
integers.
• As each instruction is handled sequentially,
basic scalar processing can take up some time.
• vector processors operate on an array of data points.
• Rather than handling each item individually, multiple
items that all have the same instruction can be handled at
once.
•This can save time over scalar processing, but also adds
complexity to a system, which can slow other functions.
•Vector processing works best when there is a large amount
of data to be processed, groups of which can be handled by
one instruction
Conventional scalar processor
Initialize I = 0
20 Read A(I)
Read B(I)
Store C(I) = A(I) + B(I)
Increment I = I + 1
If I  100 go to 20
Continue
Vector processor
Single vector instruction
C(1:100) = A(1:100) + B(1:100)
Vector processor can process
• Vector Data type
• Apply same operation on all elements of the
vector
• No dependencies amongst elements
• Same motivation as SIMD
What is vector processing?
•A vector processor is one that can compute operations on entire
vectors with one simple instruction.
•A vector compiler will attempt to translate loops into single
vector instructions.
•Example - Suppose we have the following for loop:
for i = 1, n
X(i) = Y(i) + Z(i)
continue
•This will be translated into one long vector of length n and a
vector add instruction will be executed.
Why is this more efficient?
#1: Because there is only a need for one instruction, the vector
processor will not have to fetch and decode as many
instructions; Thus, memory bandwidth and the control unit
overhead are reduced considerably.
#2: The Vector Processor, after receiving the instruction, will be
told that it must fetch x amount of pairs of operands. When
received, they will be passed on directly to a pipelined data unit
to process them.
There are 2 specific kinds of machines
#1: Memory to memory: operands are fetched from
memory and passed on directly to the functional unit.
The results are then written back out to memory to
complete the process.
#2: Register to register: operands are loaded into a set
of vector registers, the operands are fetched from the
vector registers and the results are returned to a vector
register.
Vector Instruction Set Advantages
•Compact
–one short instruction encodes N operations
•Expressive, tells hardware that these N
operations:
–are independent
–use the same functional unit
–access disjoint registers
–access registers in the same pattern as previous instructions
–access a contiguous block of memory (unit-stride load/store)
–access memory in a known pattern (strided load/store)
•Scalable
–can run same object code on more parallel pipelines or lanes
Disadvantages
•Not as fast with scalar instructions
•Complexity
•Difficulties in implementing
•High price of on-chip vector memory systems
•Increased code complexity
Applications
•Servers
•Home Cinema
•Super Computing
•Cluster Computing
•Mainframes
Fall 2008 Introduction to Parallel Processing 45
Array Computers
•An array processor is a synchronous parallel
computer with multiple arithmetic logic units, called
processing elements (PE), that can operate in
parallel.
•The PEs are synchronized to perform the same
function at the same time.
•Only a few array computers are designed primarily
for numerical computation, while the others are for
research purposes.
Fall 2008 Introduction to Parallel Processing 46
Functional structure of
array computer
• Array processors are also known as multiprocessors or vector processors.
They perform computations on large arrays of data. Thus, they are used to
improve the performance of the computer.
• Two types of Array Processor:
 Attached Array Processors
 SIMD Array Processors
Attached Array Processors:
• An attached array processor is a processor which is attached to a general
purpose computer and its purpose is to enhance and improve the performance
of that computer in numerical computational tasks.
• It achieves high performance by means of parallel processing with multiple
functional units.
SIMD Array Processors
• SIMD is the organization of a single computer containing multiple processors
operating in parallel.
• The processing units are made to operate under the control of a common
control unit, thus providing a single instruction stream and multiple data
streams.
• A general block diagram of an array processor is next slide.
• It contains a set of identical processing elements (PE's), each of which is
having a local memory M.
• Each processor element includes an ALU and registers.
• The master control unit controls all the operations of the processor elements.
It also decodes the instructions and determines how the instruction is to be
executed.
• The main memory is used for storing the program.
• The control unit is responsible for fetching the instructions.
• Vector instructions are send to all PE's simultaneously and results are
returned to the memory.
• The best known SIMD array processor is the ILLIAC IV computer
developed by the Burroughs corps. SIMD processors are highly specialized
computers.
• They are only suitable for numerical problems that can be expressed in vector
or matrix form and they are not suitable for other types of computations.
Why use the Array Processor
•Array processors increases the overall instruction processing speed.
•As most of the Array processors operates asynchronously from the host CPU, hence
it improves the overall capacity of the system.
•Array Processors has its own local memory, hence providing extra memory for
systems with low memory.

More Related Content

Similar to CSA unit5.pptx

unit 4.pptx
unit 4.pptxunit 4.pptx
Parallel architecture-programming
Parallel architecture-programmingParallel architecture-programming
Parallel architecture-programmingShaveta Banda
 
CA UNIT IV.pptx
CA UNIT IV.pptxCA UNIT IV.pptx
CA UNIT IV.pptx
ssuser9dbd7e
 
Array Processors & Architectural Classification Schemes_Computer Architecture...
Array Processors & Architectural Classification Schemes_Computer Architecture...Array Processors & Architectural Classification Schemes_Computer Architecture...
Array Processors & Architectural Classification Schemes_Computer Architecture...
Sumalatha A
 
Parallel architecture &programming
Parallel architecture &programmingParallel architecture &programming
Parallel architecture &programmingIsmail El Gayar
 
Multiprocessor.pptx
 Multiprocessor.pptx Multiprocessor.pptx
Multiprocessor.pptx
Muhammad54342
 
Parallel & Distributed processing
Parallel & Distributed processingParallel & Distributed processing
Parallel & Distributed processing
Syed Zaid Irshad
 
Pipelining, processors, risc and cisc
Pipelining, processors, risc and ciscPipelining, processors, risc and cisc
Pipelining, processors, risc and cisc
Mark Gibbs
 
Parallel processing
Parallel processingParallel processing
Parallel processing
Praveen Kumar
 
Parallel Computing
Parallel ComputingParallel Computing
Parallel Computing
Mohsin Bhat
 
Operating System Overview.pdf
Operating System Overview.pdfOperating System Overview.pdf
Operating System Overview.pdf
PrashantKhobragade3
 
Pdc lecture1
Pdc lecture1Pdc lecture1
Pdc lecture1
SyedSafeer1
 
Parallel computing
Parallel computingParallel computing
Parallel computing
Vinay Gupta
 
Module2 MultiThreads.ppt
Module2 MultiThreads.pptModule2 MultiThreads.ppt
Module2 MultiThreads.ppt
shreesha16
 
parallel processing
parallel processingparallel processing
parallel processing
Sudarshan Mondal
 
Computer system Architecture. This PPT is based on computer system
Computer system Architecture. This PPT is based on computer systemComputer system Architecture. This PPT is based on computer system
Computer system Architecture. This PPT is based on computer system
mohantysikun0
 
Introduction to Parallel Computing
Introduction to Parallel ComputingIntroduction to Parallel Computing
Introduction to Parallel Computing
Akhila Prabhakaran
 
Week # 1.pdf
Week # 1.pdfWeek # 1.pdf
Week # 1.pdf
giddy5
 
Parallel computing
Parallel computingParallel computing
Parallel computing
Engr Zardari Saddam
 

Similar to CSA unit5.pptx (20)

unit 4.pptx
unit 4.pptxunit 4.pptx
unit 4.pptx
 
Parallel architecture-programming
Parallel architecture-programmingParallel architecture-programming
Parallel architecture-programming
 
CA UNIT IV.pptx
CA UNIT IV.pptxCA UNIT IV.pptx
CA UNIT IV.pptx
 
Array Processors & Architectural Classification Schemes_Computer Architecture...
Array Processors & Architectural Classification Schemes_Computer Architecture...Array Processors & Architectural Classification Schemes_Computer Architecture...
Array Processors & Architectural Classification Schemes_Computer Architecture...
 
Parallel architecture &programming
Parallel architecture &programmingParallel architecture &programming
Parallel architecture &programming
 
Multiprocessor.pptx
 Multiprocessor.pptx Multiprocessor.pptx
Multiprocessor.pptx
 
Parallel & Distributed processing
Parallel & Distributed processingParallel & Distributed processing
Parallel & Distributed processing
 
Lecture1
Lecture1Lecture1
Lecture1
 
Pipelining, processors, risc and cisc
Pipelining, processors, risc and ciscPipelining, processors, risc and cisc
Pipelining, processors, risc and cisc
 
Parallel processing
Parallel processingParallel processing
Parallel processing
 
Parallel Computing
Parallel ComputingParallel Computing
Parallel Computing
 
Operating System Overview.pdf
Operating System Overview.pdfOperating System Overview.pdf
Operating System Overview.pdf
 
Pdc lecture1
Pdc lecture1Pdc lecture1
Pdc lecture1
 
Parallel computing
Parallel computingParallel computing
Parallel computing
 
Module2 MultiThreads.ppt
Module2 MultiThreads.pptModule2 MultiThreads.ppt
Module2 MultiThreads.ppt
 
parallel processing
parallel processingparallel processing
parallel processing
 
Computer system Architecture. This PPT is based on computer system
Computer system Architecture. This PPT is based on computer systemComputer system Architecture. This PPT is based on computer system
Computer system Architecture. This PPT is based on computer system
 
Introduction to Parallel Computing
Introduction to Parallel ComputingIntroduction to Parallel Computing
Introduction to Parallel Computing
 
Week # 1.pdf
Week # 1.pdfWeek # 1.pdf
Week # 1.pdf
 
Parallel computing
Parallel computingParallel computing
Parallel computing
 

Recently uploaded

Research 20 slides Amelia gavryliuks.pdf
Research 20 slides Amelia gavryliuks.pdfResearch 20 slides Amelia gavryliuks.pdf
Research 20 slides Amelia gavryliuks.pdf
ameli25062005
 
20 slides of research movie and artists .pdf
20 slides of research movie and artists .pdf20 slides of research movie and artists .pdf
20 slides of research movie and artists .pdf
ameli25062005
 
一比一原版(Bolton毕业证书)博尔顿大学毕业证成绩单如何办理
一比一原版(Bolton毕业证书)博尔顿大学毕业证成绩单如何办理一比一原版(Bolton毕业证书)博尔顿大学毕业证成绩单如何办理
一比一原版(Bolton毕业证书)博尔顿大学毕业证成绩单如何办理
h7j5io0
 
Common Designing Mistakes and How to avoid them
Common Designing Mistakes and How to avoid themCommon Designing Mistakes and How to avoid them
Common Designing Mistakes and How to avoid them
madhavlakhanpal29
 
一比一原版(LSE毕业证书)伦敦政治经济学院毕业证成绩单如何办理
一比一原版(LSE毕业证书)伦敦政治经济学院毕业证成绩单如何办理一比一原版(LSE毕业证书)伦敦政治经济学院毕业证成绩单如何办理
一比一原版(LSE毕业证书)伦敦政治经济学院毕业证成绩单如何办理
jyz59f4j
 
Between Filth and Fortune- Urban Cattle Foraging Realities by Devi S Nair, An...
Between Filth and Fortune- Urban Cattle Foraging Realities by Devi S Nair, An...Between Filth and Fortune- Urban Cattle Foraging Realities by Devi S Nair, An...
Between Filth and Fortune- Urban Cattle Foraging Realities by Devi S Nair, An...
Mansi Shah
 
Design Thinking Design thinking Design thinking
Design Thinking Design thinking Design thinkingDesign Thinking Design thinking Design thinking
Design Thinking Design thinking Design thinking
cy0krjxt
 
Exploring the Future of Smart Garages.pdf
Exploring the Future of Smart Garages.pdfExploring the Future of Smart Garages.pdf
Exploring the Future of Smart Garages.pdf
fastfixgaragedoor
 
一比一原版(NCL毕业证书)纽卡斯尔大学毕业证成绩单如何办理
一比一原版(NCL毕业证书)纽卡斯尔大学毕业证成绩单如何办理一比一原版(NCL毕业证书)纽卡斯尔大学毕业证成绩单如何办理
一比一原版(NCL毕业证书)纽卡斯尔大学毕业证成绩单如何办理
7sd8fier
 
Borys Sutkowski portfolio interior design
Borys Sutkowski portfolio interior designBorys Sutkowski portfolio interior design
Borys Sutkowski portfolio interior design
boryssutkowski
 
PORTFOLIO FABIANA VILLANI ARCHITECTURE.pdf
PORTFOLIO FABIANA VILLANI ARCHITECTURE.pdfPORTFOLIO FABIANA VILLANI ARCHITECTURE.pdf
PORTFOLIO FABIANA VILLANI ARCHITECTURE.pdf
fabianavillanib
 
一比一原版(CITY毕业证书)谢菲尔德哈勒姆大学毕业证如何办理
一比一原版(CITY毕业证书)谢菲尔德哈勒姆大学毕业证如何办理一比一原版(CITY毕业证书)谢菲尔德哈勒姆大学毕业证如何办理
一比一原版(CITY毕业证书)谢菲尔德哈勒姆大学毕业证如何办理
9a93xvy
 
Коричневый и Кремовый Деликатный Органический Копирайтер Фрилансер Марке...
Коричневый и Кремовый Деликатный Органический Копирайтер Фрилансер Марке...Коричневый и Кремовый Деликатный Органический Копирайтер Фрилансер Марке...
Коричневый и Кремовый Деликатный Органический Копирайтер Фрилансер Марке...
ameli25062005
 
White wonder, Work developed by Eva Tschopp
White wonder, Work developed by Eva TschoppWhite wonder, Work developed by Eva Tschopp
White wonder, Work developed by Eva Tschopp
Mansi Shah
 
Top Israeli Products and Brands - Plan it israel.pdf
Top Israeli Products and Brands - Plan it israel.pdfTop Israeli Products and Brands - Plan it israel.pdf
Top Israeli Products and Brands - Plan it israel.pdf
PlanitIsrael
 
一比一原版(毕业证)长崎大学毕业证成绩单如何办理
一比一原版(毕业证)长崎大学毕业证成绩单如何办理一比一原版(毕业证)长崎大学毕业证成绩单如何办理
一比一原版(毕业证)长崎大学毕业证成绩单如何办理
taqyed
 
一比一原版(UAL毕业证书)伦敦艺术大学毕业证成绩单如何办理
一比一原版(UAL毕业证书)伦敦艺术大学毕业证成绩单如何办理一比一原版(UAL毕业证书)伦敦艺术大学毕业证成绩单如何办理
一比一原版(UAL毕业证书)伦敦艺术大学毕业证成绩单如何办理
708pb191
 
CA OFFICE office office office _VIEWS.pdf
CA OFFICE office office office _VIEWS.pdfCA OFFICE office office office _VIEWS.pdf
CA OFFICE office office office _VIEWS.pdf
SudhanshuMandlik
 
一比一原版(UCB毕业证书)伯明翰大学学院毕业证成绩单如何办理
一比一原版(UCB毕业证书)伯明翰大学学院毕业证成绩单如何办理一比一原版(UCB毕业证书)伯明翰大学学院毕业证成绩单如何办理
一比一原版(UCB毕业证书)伯明翰大学学院毕业证成绩单如何办理
h7j5io0
 
National-Learning-Camp 2024 deped....pptx
National-Learning-Camp 2024 deped....pptxNational-Learning-Camp 2024 deped....pptx
National-Learning-Camp 2024 deped....pptx
AlecAnidul
 

Recently uploaded (20)

Research 20 slides Amelia gavryliuks.pdf
Research 20 slides Amelia gavryliuks.pdfResearch 20 slides Amelia gavryliuks.pdf
Research 20 slides Amelia gavryliuks.pdf
 
20 slides of research movie and artists .pdf
20 slides of research movie and artists .pdf20 slides of research movie and artists .pdf
20 slides of research movie and artists .pdf
 
一比一原版(Bolton毕业证书)博尔顿大学毕业证成绩单如何办理
一比一原版(Bolton毕业证书)博尔顿大学毕业证成绩单如何办理一比一原版(Bolton毕业证书)博尔顿大学毕业证成绩单如何办理
一比一原版(Bolton毕业证书)博尔顿大学毕业证成绩单如何办理
 
Common Designing Mistakes and How to avoid them
Common Designing Mistakes and How to avoid themCommon Designing Mistakes and How to avoid them
Common Designing Mistakes and How to avoid them
 
一比一原版(LSE毕业证书)伦敦政治经济学院毕业证成绩单如何办理
一比一原版(LSE毕业证书)伦敦政治经济学院毕业证成绩单如何办理一比一原版(LSE毕业证书)伦敦政治经济学院毕业证成绩单如何办理
一比一原版(LSE毕业证书)伦敦政治经济学院毕业证成绩单如何办理
 
Between Filth and Fortune- Urban Cattle Foraging Realities by Devi S Nair, An...
Between Filth and Fortune- Urban Cattle Foraging Realities by Devi S Nair, An...Between Filth and Fortune- Urban Cattle Foraging Realities by Devi S Nair, An...
Between Filth and Fortune- Urban Cattle Foraging Realities by Devi S Nair, An...
 
Design Thinking Design thinking Design thinking
Design Thinking Design thinking Design thinkingDesign Thinking Design thinking Design thinking
Design Thinking Design thinking Design thinking
 
Exploring the Future of Smart Garages.pdf
Exploring the Future of Smart Garages.pdfExploring the Future of Smart Garages.pdf
Exploring the Future of Smart Garages.pdf
 
一比一原版(NCL毕业证书)纽卡斯尔大学毕业证成绩单如何办理
一比一原版(NCL毕业证书)纽卡斯尔大学毕业证成绩单如何办理一比一原版(NCL毕业证书)纽卡斯尔大学毕业证成绩单如何办理
一比一原版(NCL毕业证书)纽卡斯尔大学毕业证成绩单如何办理
 
Borys Sutkowski portfolio interior design
Borys Sutkowski portfolio interior designBorys Sutkowski portfolio interior design
Borys Sutkowski portfolio interior design
 
PORTFOLIO FABIANA VILLANI ARCHITECTURE.pdf
PORTFOLIO FABIANA VILLANI ARCHITECTURE.pdfPORTFOLIO FABIANA VILLANI ARCHITECTURE.pdf
PORTFOLIO FABIANA VILLANI ARCHITECTURE.pdf
 
一比一原版(CITY毕业证书)谢菲尔德哈勒姆大学毕业证如何办理
一比一原版(CITY毕业证书)谢菲尔德哈勒姆大学毕业证如何办理一比一原版(CITY毕业证书)谢菲尔德哈勒姆大学毕业证如何办理
一比一原版(CITY毕业证书)谢菲尔德哈勒姆大学毕业证如何办理
 
Коричневый и Кремовый Деликатный Органический Копирайтер Фрилансер Марке...
Коричневый и Кремовый Деликатный Органический Копирайтер Фрилансер Марке...Коричневый и Кремовый Деликатный Органический Копирайтер Фрилансер Марке...
Коричневый и Кремовый Деликатный Органический Копирайтер Фрилансер Марке...
 
White wonder, Work developed by Eva Tschopp
White wonder, Work developed by Eva TschoppWhite wonder, Work developed by Eva Tschopp
White wonder, Work developed by Eva Tschopp
 
Top Israeli Products and Brands - Plan it israel.pdf
Top Israeli Products and Brands - Plan it israel.pdfTop Israeli Products and Brands - Plan it israel.pdf
Top Israeli Products and Brands - Plan it israel.pdf
 
一比一原版(毕业证)长崎大学毕业证成绩单如何办理
一比一原版(毕业证)长崎大学毕业证成绩单如何办理一比一原版(毕业证)长崎大学毕业证成绩单如何办理
一比一原版(毕业证)长崎大学毕业证成绩单如何办理
 
一比一原版(UAL毕业证书)伦敦艺术大学毕业证成绩单如何办理
一比一原版(UAL毕业证书)伦敦艺术大学毕业证成绩单如何办理一比一原版(UAL毕业证书)伦敦艺术大学毕业证成绩单如何办理
一比一原版(UAL毕业证书)伦敦艺术大学毕业证成绩单如何办理
 
CA OFFICE office office office _VIEWS.pdf
CA OFFICE office office office _VIEWS.pdfCA OFFICE office office office _VIEWS.pdf
CA OFFICE office office office _VIEWS.pdf
 
一比一原版(UCB毕业证书)伯明翰大学学院毕业证成绩单如何办理
一比一原版(UCB毕业证书)伯明翰大学学院毕业证成绩单如何办理一比一原版(UCB毕业证书)伯明翰大学学院毕业证成绩单如何办理
一比一原版(UCB毕业证书)伯明翰大学学院毕业证成绩单如何办理
 
National-Learning-Camp 2024 deped....pptx
National-Learning-Camp 2024 deped....pptxNational-Learning-Camp 2024 deped....pptx
National-Learning-Camp 2024 deped....pptx
 

CSA unit5.pptx

  • 1. What is Parallel Computing •Traditionally, software has been written for serial computation: –To be run on a single computer having a single Central Processing Unit (CPU); –A problem is broken into a discrete series of instructions. –Instructions are executed one after another. –Only one instruction may execute at any moment in time.
  • 2.
  • 3. •Parallel computing is the simultaneous use of multiple compute resources to solve a computational problem. –To be run using multiple CPUs –A problem is broken into discrete parts that can be solved concurrently –Each part is further broken down to a series of instructions •Instructions from each part execute simultaneously on different CPUs
  • 4.
  • 6. •The primary reasons for using parallel computing: –Save time –Solve large problems –Provide concurrency (do multiple things at the same time) –Taking advantage of non-local resources –Overcoming memory constraints –Cost savings
  • 7.
  • 8. Basic Design •Basic design –Memory is used to store both program and data instructions –Program instructions are coded data which tell the computer to do something –Data is simply information to be used by the program •A central processing unit (CPU) gets instructions and/or data from memory, decodes the instructions and then sequentially performs them.
  • 10. 10 Classification of parallel architectures •Flynn’s taxonomy •Classification based on the memory arrangement •Classification based on type of interconnection
  • 11. 11 Flynn’s Taxonomy – The most universally accepted method of classifying computer systems – Any computer can be placed in one of 4 broad categories » SISD: Single instruction stream, single data stream » SIMD: Single instruction stream, multiple data streams » MIMD: Multiple instruction streams, multiple data streams
  • 13. Single Instruction, Single Data (SISD) •A serial (non-parallel) computer •Single instruction: only one instruction stream is being acted on by the CPU during any one clock cycle •Single data: only one data stream is being used as input during any one clock cycle •This is the oldest and until recently, the most prevalent form of computer •Examples: most PCs, single CPU workstations and mainframes
  • 15. •A type of parallel computer •Single instruction: All processing units execute the same instruction at any given clock cycle •Multiple data: Each processing unit can operate on a different data element •Best suited for specialized problems characterized by a high degree of regularity such as image processing. •Two varieties: Processor Arrays and Vector Pipelines •Examples: –Processor Arrays: Connection Machine CM-2, Maspar MP-1, MP-2 –Vector Pipelines: IBM 9000, Cray C90, Fujitsu VP, NEC SX-2, Hitachi S820
  • 16. Applications •Image processing • Matrix manipulations • Sorting
  • 18. • A single data stream is fed into multiple processing units. • Each processing unit operates on the data independently via independent instruction streams. • Many functional units perform different operations on the same data.
  • 21. •Currently, the most common type of parallel computer. Most modern computers fall into this category. •Multiple Instruction: every processor may be executing a different instruction stream •Multiple Data: every processor may be working with a different data stream •Execution can be synchronous or asynchronous, deterministic or non-deterministic
  • 22. Classification based on memory arrangement
  • 23. 23 PE1 PEn Processors Interconnection network Shared memory Shared memory - multiprocessors I/O1 I/On PE1 Interconnection network M1 P1 PEn Mn Pn Message passing - multicomputers
  • 24. •Multiple processors can operate independently but share the same memory resources. •Changes in a memory location effected by one processor are visible to all other processors. •Processors easily communicate by means of shared variables •Shared memory machines can be divided into two main classes based upon memory access times: UMA and NUMA
  • 25.
  • 26.
  • 27. Shared Memory: Pro and Con •Advantages –Global address space provides a user-friendly programming perspective to memory –Data sharing between tasks is both fast and uniform due to the proximity of memory to CPUs •Disadvantages: –Primary disadvantage is the lack of scalability between memory and CPUs. – Adding more CPUs can geometrically increases traffic on the shared memory –Programmer responsibility for synchronization constructs that insure "correct" access of global memory. –Expense: it becomes increasingly difficult and expensive to design and produce shared memory machines with ever increasing numbers of processors.
  • 29. •Processors have their own local memory. •Memory addresses in one processor do not map to another processor, so there is no concept of global address space across all processors. •Because each processor has its own local memory, it operates independently. •Changes it makes to its local memory have no effect on the memory of other processors. Hence, the concept of cache coherency does not apply. •When a processor needs access to data in another processor, it is usually the task of the programmer to explicitly define how and when data is communicated. Synchronization between tasks is likewise the programmer's responsibility
  • 30.
  • 31. Distributed Memory: Pro and Con •Advantages –Memory is scalable with number of processors. Increase the number of processors and the size of memory increases proportionately. –Each processor can rapidly access its own memory without interference and without the overhead incurred with trying to maintain cache coherency. –Cost effectiveness: can use commodity, off-the-shelf processors and networking. •Disadvantages –The programmer is responsible for many of the details associated with data communication between processors. –It may be difficult to map existing data structures, based on global memory, to this memory organization. –Non-uniform memory access (NUMA) times
  • 32. Classification based on type of interconnections
  • 34. Scalar and vector processors •Scalar processors are the most basic type of processor. •These process one item at a time, typically integers or floating point numbers, which are numbers too large or small to be represented by integers. • As each instruction is handled sequentially, basic scalar processing can take up some time.
  • 35. • vector processors operate on an array of data points. • Rather than handling each item individually, multiple items that all have the same instruction can be handled at once. •This can save time over scalar processing, but also adds complexity to a system, which can slow other functions. •Vector processing works best when there is a large amount of data to be processed, groups of which can be handled by one instruction
  • 36. Conventional scalar processor Initialize I = 0 20 Read A(I) Read B(I) Store C(I) = A(I) + B(I) Increment I = I + 1 If I  100 go to 20 Continue
  • 37. Vector processor Single vector instruction C(1:100) = A(1:100) + B(1:100)
  • 38. Vector processor can process • Vector Data type • Apply same operation on all elements of the vector • No dependencies amongst elements • Same motivation as SIMD
  • 39. What is vector processing? •A vector processor is one that can compute operations on entire vectors with one simple instruction. •A vector compiler will attempt to translate loops into single vector instructions. •Example - Suppose we have the following for loop: for i = 1, n X(i) = Y(i) + Z(i) continue •This will be translated into one long vector of length n and a vector add instruction will be executed.
  • 40. Why is this more efficient? #1: Because there is only a need for one instruction, the vector processor will not have to fetch and decode as many instructions; Thus, memory bandwidth and the control unit overhead are reduced considerably. #2: The Vector Processor, after receiving the instruction, will be told that it must fetch x amount of pairs of operands. When received, they will be passed on directly to a pipelined data unit to process them.
  • 41. There are 2 specific kinds of machines #1: Memory to memory: operands are fetched from memory and passed on directly to the functional unit. The results are then written back out to memory to complete the process. #2: Register to register: operands are loaded into a set of vector registers, the operands are fetched from the vector registers and the results are returned to a vector register.
  • 42. Vector Instruction Set Advantages •Compact –one short instruction encodes N operations •Expressive, tells hardware that these N operations: –are independent –use the same functional unit –access disjoint registers –access registers in the same pattern as previous instructions –access a contiguous block of memory (unit-stride load/store) –access memory in a known pattern (strided load/store) •Scalable –can run same object code on more parallel pipelines or lanes
  • 43. Disadvantages •Not as fast with scalar instructions •Complexity •Difficulties in implementing •High price of on-chip vector memory systems •Increased code complexity
  • 45. Fall 2008 Introduction to Parallel Processing 45 Array Computers •An array processor is a synchronous parallel computer with multiple arithmetic logic units, called processing elements (PE), that can operate in parallel. •The PEs are synchronized to perform the same function at the same time. •Only a few array computers are designed primarily for numerical computation, while the others are for research purposes.
  • 46. Fall 2008 Introduction to Parallel Processing 46 Functional structure of array computer
  • 47. • Array processors are also known as multiprocessors or vector processors. They perform computations on large arrays of data. Thus, they are used to improve the performance of the computer. • Two types of Array Processor:  Attached Array Processors  SIMD Array Processors Attached Array Processors: • An attached array processor is a processor which is attached to a general purpose computer and its purpose is to enhance and improve the performance of that computer in numerical computational tasks. • It achieves high performance by means of parallel processing with multiple functional units.
  • 48.
  • 49. SIMD Array Processors • SIMD is the organization of a single computer containing multiple processors operating in parallel. • The processing units are made to operate under the control of a common control unit, thus providing a single instruction stream and multiple data streams. • A general block diagram of an array processor is next slide. • It contains a set of identical processing elements (PE's), each of which is having a local memory M. • Each processor element includes an ALU and registers. • The master control unit controls all the operations of the processor elements. It also decodes the instructions and determines how the instruction is to be executed.
  • 50.
  • 51. • The main memory is used for storing the program. • The control unit is responsible for fetching the instructions. • Vector instructions are send to all PE's simultaneously and results are returned to the memory. • The best known SIMD array processor is the ILLIAC IV computer developed by the Burroughs corps. SIMD processors are highly specialized computers. • They are only suitable for numerical problems that can be expressed in vector or matrix form and they are not suitable for other types of computations.
  • 52. Why use the Array Processor •Array processors increases the overall instruction processing speed. •As most of the Array processors operates asynchronously from the host CPU, hence it improves the overall capacity of the system. •Array Processors has its own local memory, hence providing extra memory for systems with low memory.