SlideShare a Scribd company logo
1 of 30
COMPUTERARCHITECTURE
Parallel Processing
Presented by:
MUHAMMAD DANIYAL QURESHI
COMPUTER SCIENCE
SHAH ABDUL LATIF UNIVERSITY
COMPUTERARCHITECTURE
Sequential Vs Parallel Processing
Let’s assume we make one sandwich by taking one slice of bread,
then one slice of cheese perhaps a piece of meat then ends with
Second slice of bread. Normal Sequential Process will make
sandwiches by repeating the same sequence of taking slice of bread,
slice of cheese then meat and then ends with another slice of bread.
While in Parallel processing, multiple sandwiches can be made in less
time by taking multiple slices of bread then multiple slices of cheese
then multiple pieces of meat and multiple slices of bread and keeping
them at a right position to complete ingredients of each sandwich.
COMPUTERARCHITECTURE
Parallel Processing
 Parallel processing is the processing of divided program instructions
among multiple processors to run a program in less time.
 A computation-intensive program that took one hour to run and a tape
copying program that took one hour to run would take a total of two
hours to run. ( Sequential Processing )
 An early form of parallel processing allowed the enclosed execution of
both programs together. The computer would start an I/O operation,
and while it was waiting for the operation to complete, it would execute
the processor-intensive program. The total execution time for the two
jobs would be a little over one hour. ( Parallel Processing )
COMPUTERARCHITECTURE
Processor Systems
S.I.S.D.
S.I.M.D.
 Single Instruction, Single Data Stream
 Single processor executes Single Instruction stream to
operate on the data stored on a Single Memory.
 Single Instruction, Multiple Data Stream
 A single machine instruction controls the simultaneous
execution of a number of processing elements on a lockstep
basis. Each processing element has an associated data
memory, so that each instruction is executed on a different set
of data by the different processors
COMPUTERARCHITECTURE
Processor Systems
M.I.S.D.
M.I.M.D.
 Multiple Instruction, Single Data Stream
 A sequence of data is transmitted to a set of processors,
each of which executes a different instruction sequence.
This structure is not commercially implemented.
 Multiple Instruction, Multiple Data Stream
 A set of processors simultaneously execute different
instruction sequences on different data sets
 SMPs, clusters and NUMA systems fit this category.
COMPUTERARCHITECTURE
Parallel Processor
Classification
With the MIMD organization, the
processors are general purpose;
each is able to process all of the
instructions necessary to perform
the appropriate data
transformation. MIMDs can be
further subdivided by the means in
which the processors
communicate. If the processors
share a common memory, then
each processor accesses programs
and data stored in the shared
memory.
COMPUTERARCHITECTURE
Alternative Computer Organization
Single Instruction, Single Data Stream
There is some sort of control unit that provides an instruction stream
to a processing unit. The processing unit operates on a single data
stream from a memory unit
COMPUTERARCHITECTURE
Alternative Computer Organization
Single Instruction, Multiple Data Stream
There is still a single control unit, provided for a single instruction stream to
multiple Processing Unit. Each Processing Unit may have its own dedicated
memory, or there may be a shared memory
COMPUTERARCHITECTURE
Alternative Computer Organization
Multiple Instruction, Multiple Data Stream (Shared Memory)
In MIMD, There are multiple control units, each feeding a separate instruction stream to its own
Processing unit. The MIMD may be a shared-memory multiprocessor or a distributed- memory
multicomputer.
COMPUTERARCHITECTURE
Alternative Computer Organization
Multiple Instruction, Multiple Data Stream (Distributed Memory)
In MIMD, There are multiple control units, each feeding a separate instruction
stream to its own Processing unit. The MIMD may be a shared-memory
multiprocessor or a distributed-memory multicomputer.
COMPUTERARCHITECTURE
Multiprocessor Operating System Design
Consideration
 OS routines need to allow several processors to execute the equal IS code at the same
time. Structures must be managed properly to avoid unacceptable operations. (
Simultaneous Parallel Processing )
 Any processor may perform scheduling so clashes must be avoided & Scheduler must
assign organized processes to available processors. ( Scheduling )
 Care must be taken to provide effective synchronization. Synchronization is a facility
that enforces mutual prevention and event arrangement. ( Synchronization )
 OS needs to use the available hardware parallelism to achieve the best performance. (
Memory Management )
 Scheduler and other portions of the operating system must recognize the loss of a
processor and restructure accordingly. OS should provide graceful message in face of
Processor failure. ( Reliability and fault tolerance )
COMPUTERARCHITECTURE
Symmetric Multiprocessor System
 Symmetric means All processors can perform the same functions.
 Two or more similar processors of comparable capacity
 Processors share same memory and I/O facilities
 Processors are connected by a bus or other internal connection
 Memory access time is approximately the same for each processor
 All processors share access to I/O devices
 Either through same channels or different channels giving paths to
same devices
 System controlled by integrated operating system
 Provides interaction between processors and their programs at job,
task, file and data element levels
COMPUTERARCHITECTURE
Symmetric Multiprocessor System
The processors can intercommunicate through shared memory. It may also be possible for
processors to exchange signals directly. The memory is often organized so that multiple accesses
to separate blocks of memory are possible. Sometimes, Each processor may also have its own
main memory and I/O channels in addition to the shared resources.
COMPUTERARCHITECTURE
Bus Organization In S.M.P.
Advantages
 Simplest approach to
multiprocessor organization.
 Easy to expand the system by
attaching more processors to the
bus.
 The bus is essentially a passive
medium and the failure of any
attached device should not cause
failure of the whole system.
Disadvantages
 Performance is limited by bus-
cycle time because memory
references pass through the
shared bus.
 Each processor should have
cache, which reduces the
number of bus accesses.
 Leads to problems with cache
coherence.
COMPUTERARCHITECTURE
Cache Coherence
Definition
 Cache coherence is the
consistency of shared resource
data that ends up stored in
multiple local caches.
 When clients in a system maintain
caches of a common memory
resource, problems may arise with
unpredictable data, which is
particularly the case with CPUs in
a multiprocessing system.
Writing Policies
 When a system writes data to cache,
it must at some point write that data
to the backing store. The timing of
this write is controlled by what is
known as the write policy.
 APPROACHES:
 WRITE-THROUGH: Write is done
synchronously both to the cache and to
the backing store
 WRITE-BACK: The write to the backing
store is postponed until the cache
blocks containing the data are about to
be modified/replaced by new content.
COMPUTERARCHITECTURE
Multiplied copy of the same data can exist in the different caches
simultaneously and if processors are allowed to update their own
copies freely, an unreliable view of memory can result.
Possible Problem With Cache Coherence Using Bus
Organization
COMPUTERARCHITECTURE
Solutions To The Cache Coherence
Problems
Software Based
 Software-based protocol rely upon
the operating system and compiler.
 Compiler-based protocol performs
analysis on the code to determine
which data is unsafe for caching,
then mark those items respectively.
 Operating system prevent un-cache-
able items from being cached.
 Software-based protocol is affective
because overhead of problems is
transferred from run-time to compile
time.
Hardware Based
 Also known as Cache coherence protocols
 These solutions provide identification at
run-time of possible irregularity situations.
 Hardware-based leading to improved
performance over a software approach.
 Approaches are transparent to the
programmer and the compiler, reducing
the software development burden
 Can be divided into two categories:
 Directory protocols
 Snoopy protocols
COMPUTERARCHITECTURE
Solutions To The Cache Coherence
Problems
Software Based
 Software-based protocol rely upon
the operating system and compiler.
 Compiler-based protocol performs
analysis on the code to determine
which data is unsafe for caching,
then mark those items respectively.
 Operating system prevent un-cache-
able items from being cached.
 Software-based protocol is affective
because overhead of problems is
transferred from run-time to compile
time.
Hardware Based
 DIRECTORY PROTOCOL:
 It collects & maintain the
information about copies of lines
reside.
 Contains the various local caches.
 Keeping the information up-to-
date.
 Manage the information which
caches copy of a line.
 DRAWBACK: Only for System with
less buses, not large-scale
systems.
COMPUTERARCHITECTURE
Solutions To The Cache Coherence
Problems
Software Based
 Software-based protocol rely upon
the operating system and compiler.
 Compiler-based protocol performs
analysis on the code to determine
which data is unsafe for caching,
then mark those items respectively.
 Operating system prevent un-cache-
able items from being cached.
 Software-based protocol is affective
because overhead of problems is
transferred from run-time to compile
time.
Hardware Based
 SNOOPY CACHE PROTOCOL: Distributed
responsibility for maintaining the cache
coherence among all controllers &
Multiprocessor.
 BASIC APPROACH: Write Invalid & Write
Update
 Write invalid Protocol: Multiple readers but
single writer, only cache can write at a
time.
 Write Update Protocol: Multiple readers,
Multiple writers. Updated input is
distributed to all caches.
COMPUTERARCHITECTURE
Cluster
 A Cluster is a group of tightly or loosely coupled computers that
work together as a single computer.
 Commonly but not always connected through fast local area
networks. But not always.
 A group of interconnected WHOLE COMPUTERS works together,
can create the misconception of being one machine having parallel
processing.
 A system that can refer run on its own apart from the cluster, used
in server systems are called whole computers.
 Each Computer in cluster is called a NODE.
COMPUTERARCHITECTURE
Cluster Products
 In Picture: IBM Hydro Cluster
 VA Cluster, developed by D.E.C. in
1980’s
 Microsoft, Sun Microsystems and
other companies also offer Cluster
Package of Computers.
 Linux is the most widely used
operating systems ever since for
cluster computers around the
world.
COMPUTERARCHITECTURE
Cluster
Architecture
The individual computers are connected by
some high-speed LAN or switch hardware.
Each computer is capable of operating
independently. In addition, a middleware
layer of software is installed in each
computer to enable cluster operation. The
cluster middleware provides a unified
system image to the user, known as a
single-system image. The middleware is
also responsible for providing high
availability, by means of load balancing and
responding to failures in individual
components. A cluster will also include
software tools for enabling the efficient
execution of programs that are capable of
parallel execution.
COMPUTERARCHITECTURE
Comparing Cluster With Symmetric
Multiprocessors
Symmetric Multiprocessor
 Easier to manage and
configure.
 Less physical space and lower
power consumption.
 Well established and stable.
Clusters
 Far superior in terms of
incremental and absolute
scalability.
 Superior in terms of availability.
 All components of the system can
readily be made highly redundant.
 Both provide a configuration with multiple processors to support high demand applications.
 Both solutions are available commercially.
COMPUTERARCHITECTURE
Parallelized Computing
 Effective use of a cluster requires executing software from a single
application in parallel.
 Following lists three general approaches to the problem:
 PARALLELIZED COMPILER:
 Determines at compile time which parts of an application can be executed in parallel.
 These are then split off to be assigned to different computers in the cluster.
 PARALLELIZED APPLICATION:
 Application written from the outset to run on a cluster and uses message passing to move data
between cluster nodes.
 PARAMETRIC COMPUTING:
 Can be used if the essence of the application is an algorithm or program that must be executed
a large number of times, each time with a different set of starting conditions or parameters.
COMPUTERARCHITECTURE
Non-Uniform Memory Access
 Alternative to SMP and clustering
 Uniform memory access (UMA)
 All processors have access to all parts of main memory using loads and stores
 Access time to all regions of memory is the same
 Access time to memory for different processors is the same
 Non-uniform memory access (NUMA)
 All processors have access to all parts of main memory using loads and stores
 Access time of processor differs depending on which region of main memory is
being accessed
 Different processors access different regions of memory at different speeds
 Cache-coherent NUMA (CC-NUMA)
 A NUMA system in which cache coherence is maintained among the caches of the
various processors
COMPUTERARCHITECTURE
Objective Of N.U.M.A. In Comparison
SYMMETRIC
MULTIPROCESS
 Has Practical limit to
number of processors
that can be used.
 Has Bus traffic limits to
between 16 and 64
processors.
CLUSTER
 Each node has its own
private main memory.
 Coherency is
maintained by software
rather than hardware.
NONUNIFORM
MEMORYACCESS
 NUMA preserves
SMP feeling while
giving large scale
multiprocessing.
 Objective is to
maintain a
transparent system
wide memory while
permitting multiple
multiprocessor
nodes, each with its
own bus or internal
interconnect system
COMPUTERARCHITECTURE
Cache Coherent Non Uniform Memory
Access Organization
There are multiple independent nodes, each of which is, in
effect, an SMP organization. Thus, each node contains multiple
processors, each with its own L1 and L2 caches & main memory.
The node is the basic building block of the overall CC-NUMA
organization. For example, each Silicon Graphics Origin node
includes two MIPS R10000 processors; each Sequent NUMA-Q
node includes four Pentium II processors. The nodes are
interconnected by means of some communications facility,
which could be a switching mechanism, a ring, or some other
networking facility. Each node in the CC-NUMA system includes
some main memory. From the point of view of the processors,
however, there is only a single addressable memory, with each
location having a unique system wide address.
COMPUTERARCHITECTURE
Cache Coherent Non Uniform Memory
Access Organization
When a processor initiates a memory access, if the requested
memory location is not in that processor’s cache, then the L2
cache initiates a fetch operation. If the desired line is in the local
portion of the main memory, the line is fetched across the local
bus. If the desired line is in a remote portion of the main
memory, then an automatic request is sent out to fetch that line
across the interconnection network, deliver it to the local bus,
and then deliver it to the requesting cache on that bus. All of
this activity is automatic and transparent to the processor and its
cache. In this configuration, cache coherence is a central
concern. Although implementations differ as to details, in
general terms we can say that each node must maintain some
sort of directory that gives it an indication of the location of
various portions of memory and also cache status information.
COMPUTERARCHITECTURE
N.U.M.A.
 It can deliver effective performance at higher levels of parallelism
than SMP without requiring major software changes.
 Bus traffic on any individual node is limited to a request that the bus
can handle.
 If many of the memory accesses are to remote nodes, performance
begins to break down
 Does not clearly look like an SMP.
 Software changes will be required to move an operating system and
applications from an SMP to a CC-NUMA system.
 Concern with ease of use.
COMPUTERARCHITECTURE
Thanks
:)

More Related Content

What's hot

Parallel processing (simd and mimd)
Parallel processing (simd and mimd)Parallel processing (simd and mimd)
Parallel processing (simd and mimd)Bhavik Vashi
 
Computer Architecture for Parallel Processing
Computer Architecture for Parallel ProcessingComputer Architecture for Parallel Processing
Computer Architecture for Parallel ProcessingDipak Kumar Singh
 
Memory consistency models
Memory consistency modelsMemory consistency models
Memory consistency modelspalani kumar
 
System call (Fork +Exec)
System call (Fork +Exec)System call (Fork +Exec)
System call (Fork +Exec)Amit Ghosh
 
IO and file systems
IO and file systems IO and file systems
IO and file systems EktaVaswani2
 
Multiprocessor architecture
Multiprocessor architectureMultiprocessor architecture
Multiprocessor architectureArpan Baishya
 
Parallel Programming
Parallel ProgrammingParallel Programming
Parallel ProgrammingUday Sharma
 
Distributed OS - An Introduction
Distributed OS - An IntroductionDistributed OS - An Introduction
Distributed OS - An IntroductionSuhit Kulkarni
 
Multithreading computer architecture
 Multithreading computer architecture  Multithreading computer architecture
Multithreading computer architecture Haris456
 
IPC (Inter-Process Communication) with Shared Memory
IPC (Inter-Process Communication) with Shared MemoryIPC (Inter-Process Communication) with Shared Memory
IPC (Inter-Process Communication) with Shared MemoryHEM DUTT
 
INTER PROCESS COMMUNICATION (IPC).pptx
INTER PROCESS COMMUNICATION (IPC).pptxINTER PROCESS COMMUNICATION (IPC).pptx
INTER PROCESS COMMUNICATION (IPC).pptxLECO9
 
Multi core-architecture
Multi core-architectureMulti core-architecture
Multi core-architecturePiyush Mittal
 

What's hot (20)

Course outline of parallel and distributed computing
Course outline of parallel and distributed computingCourse outline of parallel and distributed computing
Course outline of parallel and distributed computing
 
Parallel processing (simd and mimd)
Parallel processing (simd and mimd)Parallel processing (simd and mimd)
Parallel processing (simd and mimd)
 
Computer Architecture for Parallel Processing
Computer Architecture for Parallel ProcessingComputer Architecture for Parallel Processing
Computer Architecture for Parallel Processing
 
Desktop and multiprocessor systems
Desktop and multiprocessor systemsDesktop and multiprocessor systems
Desktop and multiprocessor systems
 
Memory consistency models
Memory consistency modelsMemory consistency models
Memory consistency models
 
System call (Fork +Exec)
System call (Fork +Exec)System call (Fork +Exec)
System call (Fork +Exec)
 
IO and file systems
IO and file systems IO and file systems
IO and file systems
 
CPU Scheduling
CPU SchedulingCPU Scheduling
CPU Scheduling
 
Multiprocessor architecture
Multiprocessor architectureMultiprocessor architecture
Multiprocessor architecture
 
Parallel Programming
Parallel ProgrammingParallel Programming
Parallel Programming
 
Pram model
Pram modelPram model
Pram model
 
Distributed OS - An Introduction
Distributed OS - An IntroductionDistributed OS - An Introduction
Distributed OS - An Introduction
 
Multithreading computer architecture
 Multithreading computer architecture  Multithreading computer architecture
Multithreading computer architecture
 
Paging and segmentation
Paging and segmentationPaging and segmentation
Paging and segmentation
 
Ports and protocols
Ports and protocolsPorts and protocols
Ports and protocols
 
IPC (Inter-Process Communication) with Shared Memory
IPC (Inter-Process Communication) with Shared MemoryIPC (Inter-Process Communication) with Shared Memory
IPC (Inter-Process Communication) with Shared Memory
 
Multiprocessor system
Multiprocessor system Multiprocessor system
Multiprocessor system
 
INTER PROCESS COMMUNICATION (IPC).pptx
INTER PROCESS COMMUNICATION (IPC).pptxINTER PROCESS COMMUNICATION (IPC).pptx
INTER PROCESS COMMUNICATION (IPC).pptx
 
Multi core-architecture
Multi core-architectureMulti core-architecture
Multi core-architecture
 
Parallel computing persentation
Parallel computing persentationParallel computing persentation
Parallel computing persentation
 

Similar to Parallel Processing Presentation2

Symmetric multiprocessing and Microkernel
Symmetric multiprocessing and MicrokernelSymmetric multiprocessing and Microkernel
Symmetric multiprocessing and MicrokernelManoraj Pannerselum
 
LM9 - OPERATIONS, SCHEDULING, Inter process xommuncation
LM9 - OPERATIONS, SCHEDULING, Inter process xommuncationLM9 - OPERATIONS, SCHEDULING, Inter process xommuncation
LM9 - OPERATIONS, SCHEDULING, Inter process xommuncationMani Deepak Choudhry
 
Operating system (BCS303) MODULE 1 NOTES
Operating system (BCS303) MODULE 1 NOTESOperating system (BCS303) MODULE 1 NOTES
Operating system (BCS303) MODULE 1 NOTESKopinathMURUGESAN
 
LM1 - Computer System Overview, system calls
LM1 - Computer System Overview, system callsLM1 - Computer System Overview, system calls
LM1 - Computer System Overview, system callsmanideepakc
 
Operating Systems
Operating Systems Operating Systems
Operating Systems Fahad Shaikh
 
Components of Computer PARALLEL-PROCESSING.pptx
Components of Computer PARALLEL-PROCESSING.pptxComponents of Computer PARALLEL-PROCESSING.pptx
Components of Computer PARALLEL-PROCESSING.pptxDaveEstonilo
 
Lecture_01 Operating System Course Introduction
Lecture_01 Operating System Course IntroductionLecture_01 Operating System Course Introduction
Lecture_01 Operating System Course IntroductionArnoyKhan
 
PARALLEL ARCHITECTURE AND COMPUTING - SHORT NOTES
PARALLEL ARCHITECTURE AND COMPUTING - SHORT NOTESPARALLEL ARCHITECTURE AND COMPUTING - SHORT NOTES
PARALLEL ARCHITECTURE AND COMPUTING - SHORT NOTESsuthi
 
MYSQL DATABASE Operating System Part2 (1).pptx
MYSQL DATABASE Operating System Part2 (1).pptxMYSQL DATABASE Operating System Part2 (1).pptx
MYSQL DATABASE Operating System Part2 (1).pptxArjayBalberan1
 

Similar to Parallel Processing Presentation2 (20)

Chapter 10
Chapter 10Chapter 10
Chapter 10
 
Parallel Processing.pptx
Parallel Processing.pptxParallel Processing.pptx
Parallel Processing.pptx
 
Symmetric multiprocessing and Microkernel
Symmetric multiprocessing and MicrokernelSymmetric multiprocessing and Microkernel
Symmetric multiprocessing and Microkernel
 
ch1.ppt
ch1.pptch1.ppt
ch1.ppt
 
LM9 - OPERATIONS, SCHEDULING, Inter process xommuncation
LM9 - OPERATIONS, SCHEDULING, Inter process xommuncationLM9 - OPERATIONS, SCHEDULING, Inter process xommuncation
LM9 - OPERATIONS, SCHEDULING, Inter process xommuncation
 
ITM(2).ppt
ITM(2).pptITM(2).ppt
ITM(2).ppt
 
Operating system (BCS303) MODULE 1 NOTES
Operating system (BCS303) MODULE 1 NOTESOperating system (BCS303) MODULE 1 NOTES
Operating system (BCS303) MODULE 1 NOTES
 
LM1 - Computer System Overview, system calls
LM1 - Computer System Overview, system callsLM1 - Computer System Overview, system calls
LM1 - Computer System Overview, system calls
 
Distributed Operating System_1
Distributed Operating System_1Distributed Operating System_1
Distributed Operating System_1
 
Types of os
Types of osTypes of os
Types of os
 
CH01.pdf
CH01.pdfCH01.pdf
CH01.pdf
 
Operating Systems
Operating Systems Operating Systems
Operating Systems
 
OS UNIT1.pptx
OS UNIT1.pptxOS UNIT1.pptx
OS UNIT1.pptx
 
Components of Computer PARALLEL-PROCESSING.pptx
Components of Computer PARALLEL-PROCESSING.pptxComponents of Computer PARALLEL-PROCESSING.pptx
Components of Computer PARALLEL-PROCESSING.pptx
 
Lecture_01 Operating System Course Introduction
Lecture_01 Operating System Course IntroductionLecture_01 Operating System Course Introduction
Lecture_01 Operating System Course Introduction
 
PARALLEL ARCHITECTURE AND COMPUTING - SHORT NOTES
PARALLEL ARCHITECTURE AND COMPUTING - SHORT NOTESPARALLEL ARCHITECTURE AND COMPUTING - SHORT NOTES
PARALLEL ARCHITECTURE AND COMPUTING - SHORT NOTES
 
Ch1
Ch1Ch1
Ch1
 
Multiprocessor
MultiprocessorMultiprocessor
Multiprocessor
 
MYSQL DATABASE Operating System Part2 (1).pptx
MYSQL DATABASE Operating System Part2 (1).pptxMYSQL DATABASE Operating System Part2 (1).pptx
MYSQL DATABASE Operating System Part2 (1).pptx
 
OS Intro.ppt
OS Intro.pptOS Intro.ppt
OS Intro.ppt
 

Parallel Processing Presentation2

  • 1. COMPUTERARCHITECTURE Parallel Processing Presented by: MUHAMMAD DANIYAL QURESHI COMPUTER SCIENCE SHAH ABDUL LATIF UNIVERSITY
  • 2. COMPUTERARCHITECTURE Sequential Vs Parallel Processing Let’s assume we make one sandwich by taking one slice of bread, then one slice of cheese perhaps a piece of meat then ends with Second slice of bread. Normal Sequential Process will make sandwiches by repeating the same sequence of taking slice of bread, slice of cheese then meat and then ends with another slice of bread. While in Parallel processing, multiple sandwiches can be made in less time by taking multiple slices of bread then multiple slices of cheese then multiple pieces of meat and multiple slices of bread and keeping them at a right position to complete ingredients of each sandwich.
  • 3. COMPUTERARCHITECTURE Parallel Processing  Parallel processing is the processing of divided program instructions among multiple processors to run a program in less time.  A computation-intensive program that took one hour to run and a tape copying program that took one hour to run would take a total of two hours to run. ( Sequential Processing )  An early form of parallel processing allowed the enclosed execution of both programs together. The computer would start an I/O operation, and while it was waiting for the operation to complete, it would execute the processor-intensive program. The total execution time for the two jobs would be a little over one hour. ( Parallel Processing )
  • 4. COMPUTERARCHITECTURE Processor Systems S.I.S.D. S.I.M.D.  Single Instruction, Single Data Stream  Single processor executes Single Instruction stream to operate on the data stored on a Single Memory.  Single Instruction, Multiple Data Stream  A single machine instruction controls the simultaneous execution of a number of processing elements on a lockstep basis. Each processing element has an associated data memory, so that each instruction is executed on a different set of data by the different processors
  • 5. COMPUTERARCHITECTURE Processor Systems M.I.S.D. M.I.M.D.  Multiple Instruction, Single Data Stream  A sequence of data is transmitted to a set of processors, each of which executes a different instruction sequence. This structure is not commercially implemented.  Multiple Instruction, Multiple Data Stream  A set of processors simultaneously execute different instruction sequences on different data sets  SMPs, clusters and NUMA systems fit this category.
  • 6. COMPUTERARCHITECTURE Parallel Processor Classification With the MIMD organization, the processors are general purpose; each is able to process all of the instructions necessary to perform the appropriate data transformation. MIMDs can be further subdivided by the means in which the processors communicate. If the processors share a common memory, then each processor accesses programs and data stored in the shared memory.
  • 7. COMPUTERARCHITECTURE Alternative Computer Organization Single Instruction, Single Data Stream There is some sort of control unit that provides an instruction stream to a processing unit. The processing unit operates on a single data stream from a memory unit
  • 8. COMPUTERARCHITECTURE Alternative Computer Organization Single Instruction, Multiple Data Stream There is still a single control unit, provided for a single instruction stream to multiple Processing Unit. Each Processing Unit may have its own dedicated memory, or there may be a shared memory
  • 9. COMPUTERARCHITECTURE Alternative Computer Organization Multiple Instruction, Multiple Data Stream (Shared Memory) In MIMD, There are multiple control units, each feeding a separate instruction stream to its own Processing unit. The MIMD may be a shared-memory multiprocessor or a distributed- memory multicomputer.
  • 10. COMPUTERARCHITECTURE Alternative Computer Organization Multiple Instruction, Multiple Data Stream (Distributed Memory) In MIMD, There are multiple control units, each feeding a separate instruction stream to its own Processing unit. The MIMD may be a shared-memory multiprocessor or a distributed-memory multicomputer.
  • 11. COMPUTERARCHITECTURE Multiprocessor Operating System Design Consideration  OS routines need to allow several processors to execute the equal IS code at the same time. Structures must be managed properly to avoid unacceptable operations. ( Simultaneous Parallel Processing )  Any processor may perform scheduling so clashes must be avoided & Scheduler must assign organized processes to available processors. ( Scheduling )  Care must be taken to provide effective synchronization. Synchronization is a facility that enforces mutual prevention and event arrangement. ( Synchronization )  OS needs to use the available hardware parallelism to achieve the best performance. ( Memory Management )  Scheduler and other portions of the operating system must recognize the loss of a processor and restructure accordingly. OS should provide graceful message in face of Processor failure. ( Reliability and fault tolerance )
  • 12. COMPUTERARCHITECTURE Symmetric Multiprocessor System  Symmetric means All processors can perform the same functions.  Two or more similar processors of comparable capacity  Processors share same memory and I/O facilities  Processors are connected by a bus or other internal connection  Memory access time is approximately the same for each processor  All processors share access to I/O devices  Either through same channels or different channels giving paths to same devices  System controlled by integrated operating system  Provides interaction between processors and their programs at job, task, file and data element levels
  • 13. COMPUTERARCHITECTURE Symmetric Multiprocessor System The processors can intercommunicate through shared memory. It may also be possible for processors to exchange signals directly. The memory is often organized so that multiple accesses to separate blocks of memory are possible. Sometimes, Each processor may also have its own main memory and I/O channels in addition to the shared resources.
  • 14. COMPUTERARCHITECTURE Bus Organization In S.M.P. Advantages  Simplest approach to multiprocessor organization.  Easy to expand the system by attaching more processors to the bus.  The bus is essentially a passive medium and the failure of any attached device should not cause failure of the whole system. Disadvantages  Performance is limited by bus- cycle time because memory references pass through the shared bus.  Each processor should have cache, which reduces the number of bus accesses.  Leads to problems with cache coherence.
  • 15. COMPUTERARCHITECTURE Cache Coherence Definition  Cache coherence is the consistency of shared resource data that ends up stored in multiple local caches.  When clients in a system maintain caches of a common memory resource, problems may arise with unpredictable data, which is particularly the case with CPUs in a multiprocessing system. Writing Policies  When a system writes data to cache, it must at some point write that data to the backing store. The timing of this write is controlled by what is known as the write policy.  APPROACHES:  WRITE-THROUGH: Write is done synchronously both to the cache and to the backing store  WRITE-BACK: The write to the backing store is postponed until the cache blocks containing the data are about to be modified/replaced by new content.
  • 16. COMPUTERARCHITECTURE Multiplied copy of the same data can exist in the different caches simultaneously and if processors are allowed to update their own copies freely, an unreliable view of memory can result. Possible Problem With Cache Coherence Using Bus Organization
  • 17. COMPUTERARCHITECTURE Solutions To The Cache Coherence Problems Software Based  Software-based protocol rely upon the operating system and compiler.  Compiler-based protocol performs analysis on the code to determine which data is unsafe for caching, then mark those items respectively.  Operating system prevent un-cache- able items from being cached.  Software-based protocol is affective because overhead of problems is transferred from run-time to compile time. Hardware Based  Also known as Cache coherence protocols  These solutions provide identification at run-time of possible irregularity situations.  Hardware-based leading to improved performance over a software approach.  Approaches are transparent to the programmer and the compiler, reducing the software development burden  Can be divided into two categories:  Directory protocols  Snoopy protocols
  • 18. COMPUTERARCHITECTURE Solutions To The Cache Coherence Problems Software Based  Software-based protocol rely upon the operating system and compiler.  Compiler-based protocol performs analysis on the code to determine which data is unsafe for caching, then mark those items respectively.  Operating system prevent un-cache- able items from being cached.  Software-based protocol is affective because overhead of problems is transferred from run-time to compile time. Hardware Based  DIRECTORY PROTOCOL:  It collects & maintain the information about copies of lines reside.  Contains the various local caches.  Keeping the information up-to- date.  Manage the information which caches copy of a line.  DRAWBACK: Only for System with less buses, not large-scale systems.
  • 19. COMPUTERARCHITECTURE Solutions To The Cache Coherence Problems Software Based  Software-based protocol rely upon the operating system and compiler.  Compiler-based protocol performs analysis on the code to determine which data is unsafe for caching, then mark those items respectively.  Operating system prevent un-cache- able items from being cached.  Software-based protocol is affective because overhead of problems is transferred from run-time to compile time. Hardware Based  SNOOPY CACHE PROTOCOL: Distributed responsibility for maintaining the cache coherence among all controllers & Multiprocessor.  BASIC APPROACH: Write Invalid & Write Update  Write invalid Protocol: Multiple readers but single writer, only cache can write at a time.  Write Update Protocol: Multiple readers, Multiple writers. Updated input is distributed to all caches.
  • 20. COMPUTERARCHITECTURE Cluster  A Cluster is a group of tightly or loosely coupled computers that work together as a single computer.  Commonly but not always connected through fast local area networks. But not always.  A group of interconnected WHOLE COMPUTERS works together, can create the misconception of being one machine having parallel processing.  A system that can refer run on its own apart from the cluster, used in server systems are called whole computers.  Each Computer in cluster is called a NODE.
  • 21. COMPUTERARCHITECTURE Cluster Products  In Picture: IBM Hydro Cluster  VA Cluster, developed by D.E.C. in 1980’s  Microsoft, Sun Microsystems and other companies also offer Cluster Package of Computers.  Linux is the most widely used operating systems ever since for cluster computers around the world.
  • 22. COMPUTERARCHITECTURE Cluster Architecture The individual computers are connected by some high-speed LAN or switch hardware. Each computer is capable of operating independently. In addition, a middleware layer of software is installed in each computer to enable cluster operation. The cluster middleware provides a unified system image to the user, known as a single-system image. The middleware is also responsible for providing high availability, by means of load balancing and responding to failures in individual components. A cluster will also include software tools for enabling the efficient execution of programs that are capable of parallel execution.
  • 23. COMPUTERARCHITECTURE Comparing Cluster With Symmetric Multiprocessors Symmetric Multiprocessor  Easier to manage and configure.  Less physical space and lower power consumption.  Well established and stable. Clusters  Far superior in terms of incremental and absolute scalability.  Superior in terms of availability.  All components of the system can readily be made highly redundant.  Both provide a configuration with multiple processors to support high demand applications.  Both solutions are available commercially.
  • 24. COMPUTERARCHITECTURE Parallelized Computing  Effective use of a cluster requires executing software from a single application in parallel.  Following lists three general approaches to the problem:  PARALLELIZED COMPILER:  Determines at compile time which parts of an application can be executed in parallel.  These are then split off to be assigned to different computers in the cluster.  PARALLELIZED APPLICATION:  Application written from the outset to run on a cluster and uses message passing to move data between cluster nodes.  PARAMETRIC COMPUTING:  Can be used if the essence of the application is an algorithm or program that must be executed a large number of times, each time with a different set of starting conditions or parameters.
  • 25. COMPUTERARCHITECTURE Non-Uniform Memory Access  Alternative to SMP and clustering  Uniform memory access (UMA)  All processors have access to all parts of main memory using loads and stores  Access time to all regions of memory is the same  Access time to memory for different processors is the same  Non-uniform memory access (NUMA)  All processors have access to all parts of main memory using loads and stores  Access time of processor differs depending on which region of main memory is being accessed  Different processors access different regions of memory at different speeds  Cache-coherent NUMA (CC-NUMA)  A NUMA system in which cache coherence is maintained among the caches of the various processors
  • 26. COMPUTERARCHITECTURE Objective Of N.U.M.A. In Comparison SYMMETRIC MULTIPROCESS  Has Practical limit to number of processors that can be used.  Has Bus traffic limits to between 16 and 64 processors. CLUSTER  Each node has its own private main memory.  Coherency is maintained by software rather than hardware. NONUNIFORM MEMORYACCESS  NUMA preserves SMP feeling while giving large scale multiprocessing.  Objective is to maintain a transparent system wide memory while permitting multiple multiprocessor nodes, each with its own bus or internal interconnect system
  • 27. COMPUTERARCHITECTURE Cache Coherent Non Uniform Memory Access Organization There are multiple independent nodes, each of which is, in effect, an SMP organization. Thus, each node contains multiple processors, each with its own L1 and L2 caches & main memory. The node is the basic building block of the overall CC-NUMA organization. For example, each Silicon Graphics Origin node includes two MIPS R10000 processors; each Sequent NUMA-Q node includes four Pentium II processors. The nodes are interconnected by means of some communications facility, which could be a switching mechanism, a ring, or some other networking facility. Each node in the CC-NUMA system includes some main memory. From the point of view of the processors, however, there is only a single addressable memory, with each location having a unique system wide address.
  • 28. COMPUTERARCHITECTURE Cache Coherent Non Uniform Memory Access Organization When a processor initiates a memory access, if the requested memory location is not in that processor’s cache, then the L2 cache initiates a fetch operation. If the desired line is in the local portion of the main memory, the line is fetched across the local bus. If the desired line is in a remote portion of the main memory, then an automatic request is sent out to fetch that line across the interconnection network, deliver it to the local bus, and then deliver it to the requesting cache on that bus. All of this activity is automatic and transparent to the processor and its cache. In this configuration, cache coherence is a central concern. Although implementations differ as to details, in general terms we can say that each node must maintain some sort of directory that gives it an indication of the location of various portions of memory and also cache status information.
  • 29. COMPUTERARCHITECTURE N.U.M.A.  It can deliver effective performance at higher levels of parallelism than SMP without requiring major software changes.  Bus traffic on any individual node is limited to a request that the bus can handle.  If many of the memory accesses are to remote nodes, performance begins to break down  Does not clearly look like an SMP.  Software changes will be required to move an operating system and applications from an SMP to a CC-NUMA system.  Concern with ease of use.