Distributed Memory Architecture
MS(CS) - I
Hafsa Habib
Syeda Haseeba Khanam
Amber Azhar
Zainab Khalid
Lahore College for Women University
Department of Computer Science
Content
● MIMD processor classification
● Distributed MIMD architecture
○ Basic difference between DM-MIMD and SM-MIMD
● Communication Techniques of DM-MIMD
● Major classification of DM-MIMD
○ NUMA
○ MPP
○ Cluster
● Pros and Cons of DM-MIMD over SM-MIMD architecture
○ Scalability
○ Issues in scalability
2
MIMD Architecture: Classification
3
Non-Shared MIMD Architecture (DM-MIMD)
Non Shared MIMD Architecture
● Also called Distributed Memory MIMD or Message Passing MIMD
Computers or Loosely coupled MIMD
● Processors have their own memory local memory
○ Memory address for one processor does not map on other processors
○ No concept of global address space
● Each processor operates independently because of its own local memory
○ Changes in one processor’s local memory has no effect on other
processor’s local memory
○ Therefor cache synchronization and cache coherency does not apply.
● Inter Process Communication is done by Message Passing.
5
DM-MIMD vs SM-MIMD
6
DM-MIMD vs SM-MIMD
DM-MIMD
● Private physical address space
for each processor
● Data must be explicitly assigned
to the private address space
● Communication/synchronization
via network by Message Passing
● Concept of cache coherency
does not apply because no global
address space
SM-MIMD
● Global address space shared by
all
● Data is implicitly assigned to the
address space.
● Cooperate by reading/writing
same shared variable
● Communication through BUS
● Concept of cache coherency
applies due to shared Global
address space 7
Content
● MIMD processor classification
● Distributed MIMD architecture
○ Basic difference between DM-MIMD and SM-MIMD
● Communication Technique
● Major classification of DM-MIMD
○ NUMA
○ MPP
○ Cluster
● Pros and Cons of DM-MIMD over SM-MIMD architecture
○ Scalability
○ Issues in scalability
8
Communication Technique
Communication in DM
Architecture
● Require a communication
NETWORK to connect inter
processor memory.
● Communication and
Synchronization is done through
Message Passing Model.
● Processor share data by explicitly
send and receive information.
● Coordination is built into message
passing primitives
○ message SEND and message
RECEIVE
10
Why DM-
Architecture use Message
Passing Model?
In Distributed memory architecture there is no
global memory so it is necessary to move data
from one local memory to another by means of
message passing.
11
Message Passing Model
● Communication via
Send/Receive
○ Through Interconnection
Network
● Data is packed into larger
packets
● Send sends message to
another destination processor
● Receive indicates that a
processor is ready to receive a
message; message from
another source processor
12
Message Passing Model (cont’d)
● When a process interacts with another, two requirements have to be satisfied.
○ Synchronization and Communication
● Synchronization in message passing model is either asynchronous or
synchronous
○ If Asynchronous , it means no acknowledgement is required at both
ends(receiver and sender)
■ Sender and receiver don’t wait for each other and can carry on their
own computations while transfer of messages is being done.
○ If synchronous, Acknowledgement is required.
■ Both processors have to wait for each other while transferring the
message. (one blocks until the second is ready)
13
14
Pros and Cons of Message Passing Model
● The advantage for programmers is that
communication is explicit, so there are
fewer “performance surprises” than with
the implicit communication in cache-
coherent SMPs.
● Synchronization is naturally associated
with sending messages, reducing the
possibility for errors introduced by
incorrect synchronization
● Much easier for hardware designers to
design
● Message sending and receiving is much
slower
● It's harder to port a sequential program
to a message passing multiprocessor .
Pros Cons
15
Communication in DM Architecture
Vs
Communication in SM Architecture
Sr.No. Difference Distributed Memory
Architecture
Shared Memory
Architecture
1. Explicit Communication/Implicit
Communication
Explicit via Messages Implicit via Memory
Operations
2. Who is Responsible for carrying
communication task?
Programmer is
responsible to send
and receive data
Sending and receiving
is automatic.
System is Responsible
for setting data in
cache. Programmer
just load from
memory and store to
memory.
3. Synchronization Automatic Can be Achieved using
different mechanism
4. Protocols Fully under
programmer control
Hidden within the
system
17
Content
● MIMD processor classification
● Distributed MIMD architecture
○ Basic difference between DM-MIMD and SM-MIMD
● Communication Techniques of DM-MIMD
● Major classification of DM-MIMD
○ NUMA
○ MPP
○ Cluster
● Pros and Cons of DM-MIMD over SM-MIMD architecture
○ Scalability
○ Issues in scalability
18
Classification of Distributed Memory Architecture
Types of Distributed Memory Architecture
DM-MIMD
Architecture
NUMAClusters MPP
20
NUMA (Non-Uniform memory Access)
21
NUMA (Non-Uniform memory Access)
● NUMA is a computer memory design
used in multiprocessing, where the
memory access time depends on the
memory location relative to the
processor.
● Under NUMA, a processor can access its
own local memory faster than non-local
memory (memory local to another
processor or memory shared between
processors).
● The benefits of NUMA are limited to
particular workloads, notably on servers
where the data is often associated
strongly with certain tasks or users,
● There are two morals to this performance
story.
● The first is that even a single 32-bit , but
already commonplace, processor is
starting to push the limits of standard
memory performance.
● The second is that even conventional
memory types differences play a role in
overall system performance. So it should
come as no surprise that NUMA support
is now in server operating systems. e.g
Microsoft’s Windows Server 2003 and in
Linux 2.6 kernel.
22
MPP (Massively Parallel Processor)
23
MPP (Massively Parallel Processor)
24
MPP (Massively Parallel Processor Architecture)
25
What is a Cluster
● Network of independent computers
○ Each has private memory and OS
○ Connected using I/O system
E.g., Ethernet/switch, Internet
● Independent Computers in a cluster are called Node
○ Master and computing Nodes
● Cluster Middleware is required
○ Message Passing Interface
● Node management is to be considered
● Appear as a single system to user
26
Clusters
Clusters split problem in smaller tasks
that are executed concurrently
Why?
● Absolute physical limits of
hardware components
● Economical reasons – more
complex = more expensive
● Performance limits – double
frequency <> double performance
● Large applications – demand too
much memory & time
Advantages:
Increasing speed & optimizing
resources utilization.greatly
independent of hardware
Disadvantages:
Complex programming models –
difficult development
Applications
Suitable for applications with
independent tasks
SuperComputers ,Web
servers, databases, simulations,
27
Clusters
28
Clusters vs MPP
Similar to MPPs
● Commodity processor and memory
○ Processor performance must be maximized
● Memory Hierarchy includes remote memory
○ Non Uniform Memory Access
● No shared memory - message passing
29
Clusters vs MPPs
Clusters
● In a cluster, each machine is largely
independent of the others in terms
of memory, disk, etc.
● They are interconnected using
some variation on normal
networking.
● The cluster exists mostly in the
mind of the programmer and how
s/he chooses to distribute the
work.
● Best to use in servers with multiple
independent tasks.
MPPs
● In a Massively Parallel Processor,
there really is only one machine
with thousands of CPUs tightly
● Interconnected with I/O
subsystem.
● MPPs have exotic memory
architectures to allow extremely
high speed exchange of
intermediate results with
neighboring processors.
● MPPs are of use only on algorithms
that are embarrassingly parallel . 30
Content
● MIMD processor classification
● Distributed MIMD architecture
○ Basic difference between DM-MIMD and SM-MIMD
● Communication Technique
● Major classification of DM-MIMD
○ NUMA
○ MPP
○ Cluster
● Pros and Cons of DM-MIMD over SM-MIMD architecture
○ Issues in DM Architecture
○ Scalability
31
Pros & Cons of DM-MIMD over SM-MIMD
Pros of DM-MIMD over SM-MIMD
DM-MIMD
● Memory is scalable with the
number of processors.Increase the
number of processors and the size
of memory increases
proportionately.
● Each processor can rapidly access
its own memory without
interference and without overhead
with trying to maintain global cache
concurrency.
● Cost effectiveness: can use
commodity, off-the-shelf
processors and networking.
SM-MIMD
● Lack of scalability between
memory and CPUs: Adding more
CPUs can geometrically increases
traffic on the shared memory CPU
path,and geometrically increase
traffic associated with cache
memory management.
● Expense:it becomes increasingly
difficult and expensive to design
and produce shared memory
machines with ever increasing
number of processors.
33
Cons of DM-MIMD over SM-MIMD
DM-MIMD
● Non uniform memory access
times-data residing on a
remote node takes longer to
access than local data.
● The programmer is
responsible for many of the
details associated with data
communication processors .
SM-MIMD
● Data sharing between tasks
is both fast and uniform due
to the proximity of memory
to CPUs.
● Global address space
provides a user-friendly
programming perspective to
memory.
34
Issues of DM Architecture
Latency and Bandwidth for accessing distributed memory is the main memory
performance issues:
● Efficiency in parallel processing is usually related to ratio of time for calculation vs
time for communication, the higher the ratio the higher the performance.
● Problem is more even severe when access to distributed memory is needed,since
there is an extra level in the memory hierarchy,with latency and bandwidth that can
be very slower than local memory access.
35
Scalability and its issues
A scalable architecture is an architecture that can scale up to meet increased work loads. In other
words, if the workload all of a sudden exceeds the capacity of your existing software + hardware
combination, you can scale up the system (software + hardware) to meet the increased workload.
Scalability to more processor is the key issue
● Access times to “distinct processors should not be very much slower than
access to “nearby” processors since non-local and collective (all-to-all)
communication is important for many programs.This can be a problem for
large parallel computers (hundreds or thousands of processors). Many
different approaches to network topology and switching have been tried in
attempting to alleviate this program.
36
We Welcome your Questions , Suggestions and Comments!
37

distributed memory architecture/ Non Shared MIMD Architecture

  • 1.
    Distributed Memory Architecture MS(CS)- I Hafsa Habib Syeda Haseeba Khanam Amber Azhar Zainab Khalid Lahore College for Women University Department of Computer Science
  • 2.
    Content ● MIMD processorclassification ● Distributed MIMD architecture ○ Basic difference between DM-MIMD and SM-MIMD ● Communication Techniques of DM-MIMD ● Major classification of DM-MIMD ○ NUMA ○ MPP ○ Cluster ● Pros and Cons of DM-MIMD over SM-MIMD architecture ○ Scalability ○ Issues in scalability 2
  • 3.
  • 4.
  • 5.
    Non Shared MIMDArchitecture ● Also called Distributed Memory MIMD or Message Passing MIMD Computers or Loosely coupled MIMD ● Processors have their own memory local memory ○ Memory address for one processor does not map on other processors ○ No concept of global address space ● Each processor operates independently because of its own local memory ○ Changes in one processor’s local memory has no effect on other processor’s local memory ○ Therefor cache synchronization and cache coherency does not apply. ● Inter Process Communication is done by Message Passing. 5
  • 6.
  • 7.
    DM-MIMD vs SM-MIMD DM-MIMD ●Private physical address space for each processor ● Data must be explicitly assigned to the private address space ● Communication/synchronization via network by Message Passing ● Concept of cache coherency does not apply because no global address space SM-MIMD ● Global address space shared by all ● Data is implicitly assigned to the address space. ● Cooperate by reading/writing same shared variable ● Communication through BUS ● Concept of cache coherency applies due to shared Global address space 7
  • 8.
    Content ● MIMD processorclassification ● Distributed MIMD architecture ○ Basic difference between DM-MIMD and SM-MIMD ● Communication Technique ● Major classification of DM-MIMD ○ NUMA ○ MPP ○ Cluster ● Pros and Cons of DM-MIMD over SM-MIMD architecture ○ Scalability ○ Issues in scalability 8
  • 9.
  • 10.
    Communication in DM Architecture ●Require a communication NETWORK to connect inter processor memory. ● Communication and Synchronization is done through Message Passing Model. ● Processor share data by explicitly send and receive information. ● Coordination is built into message passing primitives ○ message SEND and message RECEIVE 10
  • 11.
    Why DM- Architecture useMessage Passing Model? In Distributed memory architecture there is no global memory so it is necessary to move data from one local memory to another by means of message passing. 11
  • 12.
    Message Passing Model ●Communication via Send/Receive ○ Through Interconnection Network ● Data is packed into larger packets ● Send sends message to another destination processor ● Receive indicates that a processor is ready to receive a message; message from another source processor 12
  • 13.
    Message Passing Model(cont’d) ● When a process interacts with another, two requirements have to be satisfied. ○ Synchronization and Communication ● Synchronization in message passing model is either asynchronous or synchronous ○ If Asynchronous , it means no acknowledgement is required at both ends(receiver and sender) ■ Sender and receiver don’t wait for each other and can carry on their own computations while transfer of messages is being done. ○ If synchronous, Acknowledgement is required. ■ Both processors have to wait for each other while transferring the message. (one blocks until the second is ready) 13
  • 14.
  • 15.
    Pros and Consof Message Passing Model ● The advantage for programmers is that communication is explicit, so there are fewer “performance surprises” than with the implicit communication in cache- coherent SMPs. ● Synchronization is naturally associated with sending messages, reducing the possibility for errors introduced by incorrect synchronization ● Much easier for hardware designers to design ● Message sending and receiving is much slower ● It's harder to port a sequential program to a message passing multiprocessor . Pros Cons 15
  • 16.
    Communication in DMArchitecture Vs Communication in SM Architecture
  • 17.
    Sr.No. Difference DistributedMemory Architecture Shared Memory Architecture 1. Explicit Communication/Implicit Communication Explicit via Messages Implicit via Memory Operations 2. Who is Responsible for carrying communication task? Programmer is responsible to send and receive data Sending and receiving is automatic. System is Responsible for setting data in cache. Programmer just load from memory and store to memory. 3. Synchronization Automatic Can be Achieved using different mechanism 4. Protocols Fully under programmer control Hidden within the system 17
  • 18.
    Content ● MIMD processorclassification ● Distributed MIMD architecture ○ Basic difference between DM-MIMD and SM-MIMD ● Communication Techniques of DM-MIMD ● Major classification of DM-MIMD ○ NUMA ○ MPP ○ Cluster ● Pros and Cons of DM-MIMD over SM-MIMD architecture ○ Scalability ○ Issues in scalability 18
  • 19.
    Classification of DistributedMemory Architecture
  • 20.
    Types of DistributedMemory Architecture DM-MIMD Architecture NUMAClusters MPP 20
  • 21.
  • 22.
    NUMA (Non-Uniform memoryAccess) ● NUMA is a computer memory design used in multiprocessing, where the memory access time depends on the memory location relative to the processor. ● Under NUMA, a processor can access its own local memory faster than non-local memory (memory local to another processor or memory shared between processors). ● The benefits of NUMA are limited to particular workloads, notably on servers where the data is often associated strongly with certain tasks or users, ● There are two morals to this performance story. ● The first is that even a single 32-bit , but already commonplace, processor is starting to push the limits of standard memory performance. ● The second is that even conventional memory types differences play a role in overall system performance. So it should come as no surprise that NUMA support is now in server operating systems. e.g Microsoft’s Windows Server 2003 and in Linux 2.6 kernel. 22
  • 23.
  • 24.
  • 25.
    MPP (Massively ParallelProcessor Architecture) 25
  • 26.
    What is aCluster ● Network of independent computers ○ Each has private memory and OS ○ Connected using I/O system E.g., Ethernet/switch, Internet ● Independent Computers in a cluster are called Node ○ Master and computing Nodes ● Cluster Middleware is required ○ Message Passing Interface ● Node management is to be considered ● Appear as a single system to user 26
  • 27.
    Clusters Clusters split problemin smaller tasks that are executed concurrently Why? ● Absolute physical limits of hardware components ● Economical reasons – more complex = more expensive ● Performance limits – double frequency <> double performance ● Large applications – demand too much memory & time Advantages: Increasing speed & optimizing resources utilization.greatly independent of hardware Disadvantages: Complex programming models – difficult development Applications Suitable for applications with independent tasks SuperComputers ,Web servers, databases, simulations, 27
  • 28.
  • 29.
    Clusters vs MPP Similarto MPPs ● Commodity processor and memory ○ Processor performance must be maximized ● Memory Hierarchy includes remote memory ○ Non Uniform Memory Access ● No shared memory - message passing 29
  • 30.
    Clusters vs MPPs Clusters ●In a cluster, each machine is largely independent of the others in terms of memory, disk, etc. ● They are interconnected using some variation on normal networking. ● The cluster exists mostly in the mind of the programmer and how s/he chooses to distribute the work. ● Best to use in servers with multiple independent tasks. MPPs ● In a Massively Parallel Processor, there really is only one machine with thousands of CPUs tightly ● Interconnected with I/O subsystem. ● MPPs have exotic memory architectures to allow extremely high speed exchange of intermediate results with neighboring processors. ● MPPs are of use only on algorithms that are embarrassingly parallel . 30
  • 31.
    Content ● MIMD processorclassification ● Distributed MIMD architecture ○ Basic difference between DM-MIMD and SM-MIMD ● Communication Technique ● Major classification of DM-MIMD ○ NUMA ○ MPP ○ Cluster ● Pros and Cons of DM-MIMD over SM-MIMD architecture ○ Issues in DM Architecture ○ Scalability 31
  • 32.
    Pros & Consof DM-MIMD over SM-MIMD
  • 33.
    Pros of DM-MIMDover SM-MIMD DM-MIMD ● Memory is scalable with the number of processors.Increase the number of processors and the size of memory increases proportionately. ● Each processor can rapidly access its own memory without interference and without overhead with trying to maintain global cache concurrency. ● Cost effectiveness: can use commodity, off-the-shelf processors and networking. SM-MIMD ● Lack of scalability between memory and CPUs: Adding more CPUs can geometrically increases traffic on the shared memory CPU path,and geometrically increase traffic associated with cache memory management. ● Expense:it becomes increasingly difficult and expensive to design and produce shared memory machines with ever increasing number of processors. 33
  • 34.
    Cons of DM-MIMDover SM-MIMD DM-MIMD ● Non uniform memory access times-data residing on a remote node takes longer to access than local data. ● The programmer is responsible for many of the details associated with data communication processors . SM-MIMD ● Data sharing between tasks is both fast and uniform due to the proximity of memory to CPUs. ● Global address space provides a user-friendly programming perspective to memory. 34
  • 35.
    Issues of DMArchitecture Latency and Bandwidth for accessing distributed memory is the main memory performance issues: ● Efficiency in parallel processing is usually related to ratio of time for calculation vs time for communication, the higher the ratio the higher the performance. ● Problem is more even severe when access to distributed memory is needed,since there is an extra level in the memory hierarchy,with latency and bandwidth that can be very slower than local memory access. 35
  • 36.
    Scalability and itsissues A scalable architecture is an architecture that can scale up to meet increased work loads. In other words, if the workload all of a sudden exceeds the capacity of your existing software + hardware combination, you can scale up the system (software + hardware) to meet the increased workload. Scalability to more processor is the key issue ● Access times to “distinct processors should not be very much slower than access to “nearby” processors since non-local and collective (all-to-all) communication is important for many programs.This can be a problem for large parallel computers (hundreds or thousands of processors). Many different approaches to network topology and switching have been tried in attempting to alleviate this program. 36
  • 37.
    We Welcome yourQuestions , Suggestions and Comments! 37

Editor's Notes

  • #2 http://slideplayer.com/slide/8893733/
  • #18 DM: Protocols are complex to programmer causing communication to be treated as an I/O call. SM: Communication can be close to hardware because of shared bus system and if we modify our shared memory’s hardware then communication will be fast
  • #29 http://www.brainkart.com/article/Computer-Clusters-and-MPP-Architectures_11316/
  • #30 Commodity computing involves the use of large numbers of already-available computing components for parallel computing, to get the greatest amount of useful computation at low cost
  • #31 However, if you have such a problem, then an MPP can be shockingly fast.
  • #36 Latency is the amount of time a message takes to traverse a system. In a computer network, it is an expression of how much time it takes for a packet of data to get from one designated point to another. It is sometimes measured as the time required for a packet to be returned to its sender.