SlideShare a Scribd company logo
Scalable
Multiprocessors
SCALABILITY
• Almost all computers allow the capability of the system to be
increased in some form, for example by adding memory, I/O
cards, disks or upgraded processor(s), but the increase
typically has hard limits
• A scalable system attempts to avoid inherent design limits on
the extent to which resources can be added to the system
• Four aspects of scalability:
– How does the bandwidth or throughput of the system increase with additional
processors?
– How does the latency or time per operation increase?
– How does the cost of the system increase?
– How do we actually package the systems and put them together
Bandwidth Scaling
• If a large number of processors are to exchange
information simultaneously with many other
processors or memories, a large number of
independent wires must connect them.
• Thus scalable machines must be organized in
the manner shown in figure (next slide) where
a large number of processor modules and
memory modules are connected together by
independent wires through a large number of
switches
• A switch may be realized by a bus, a crossbar or even a collection
of multiplexers
• The number of outputs (or inputs) to the switch is called degree of
the switch
• Switches are limited in scale but may be interconnected to form
large configurations, that is, networks
• Controllers are also available to determine which inputs are to be
connected to which outputs at each instant in time
• A network switch is a more general-purpose device, in which the
information presented at the input is enough for the switch
controller to determine the proper output without consulting all
the nodes
• Pairs of modules are connected by routes through network switches
• The most common structure for scalable
machines is illustrated by the generic
architecture shown in fig (next slide)
• Here one or more processors are packaged
together with one or more memory modules
and a communication assist as an easily
replicated unit, which is called a node
• The intranode switch is typically a high-
performance bus
In dancehall configuration processing nodes are separated from
memory nodes by the network
• If the memory modules are on the opposite side
of the interconnect, as in fig (previous slide) the
network bandwidth requirement scales linearly
with the number of processors, even when no
communication occurs between processes
• Providing adequate bandwidth scaling may not
be enough for the computational performance to
scale perfectly since the access latency increases
with the number of processors
• By distributing the memories across the
processors, all processes can access local memory
with fixed latency, independent of the number of
processors; thus the computational performance
of the system can scale perfectly
The following assumptions are made to achieve scalable
bandwidth:
• It must be possible to have a very large number of
concurrent transactions using different wires
• They are initiated independently and without global
arbitration
• The effects of a transaction (such as changes of state)
are directly visible only by the nodes involved in the
transaction
• The effects may eventually become visible to other
nodes as they are propagated by additional
transactions
• Although it is possible to broadcast information to all
nodes, broadcast bandwidth (i,.e. the rate at which
broadcasts can be performed) does not increase with
the number of nodes
Latency Scaling
The time to transfer n bytes between two nodes
is given by
T(n) = Overhead + channel time + routing Delay
Where overhead is the processing time in
initiating or completing the transfer
Channel Time is n/B (where B is the bandwidth
of the thinnest channel)
Routing Delay is a function f(H,n) of the number
of routing steps or hops in the transfer and
number of bytes transferred
Prob 7.1: Many classic networks are
constructed out of fixed-degree switches in a
configuration or topology, such that for n nodes
the distance from any network input to any
network output is log2n and the total number of
switches is α n log n for some small constant α.
Assuming the overhead as 1µs per message, the
link bandwidth is 64 MB/s and the router delay
is 200 ns per hop. How much does the time for
a 128-byte transfer increase as the machine is
scaled from 64 to 1,024 nodes?
Solution: At 64 nodes, six hops are required so
This increases to 5µs on a 1024-node
configuration. Thus, the latency increases by
less than 20% with a 16-fold increase in
machine size. Even with this small transfer
size, a store-and-forward delay would add
2µs(the time to buffer 128 bytes)to the
routing delay per hop. Thus the latency would
be
at 64 nodes and
Cost Scaling:
• It may be viewed as a fixed cost for the system
infrastructure plus an incremental cost of
adding processors and memory to the system:
Realizing Programming Models
• Here we examine what is required to
implement programming models on large
distributed –memory machines
• These machines have been most strongly
associated with message-passing
programming models
• Shared address space programming models
have become increasingly important and
well represented
• The concept of a communication abstraction, which defined
the set of communication primitives provided to the user
• These could be realized directly in the hardware via system
software or through some combination of the two, as shown
in fig below
• In large-scale parallel machines the
programming model is realized in a similar
manner, except that the primitive events are
transactions across the network, that is,
network transactions rather than bus
transactions
• A network transaction is a one-way transfer of
information from an output buffer at the
source to an input buffer at the destination
that causes some kind of action at the
destination, the occurrence of which is not
directly visible at the source, as shown in fig
(next slide)
• Primitive Network Transactions
• Before starting a bus transaction, a protection
check has been performed as part of the
virtual-to-physical address translation
• The format of information in a bus transaction
is determined by the physical wires of the bus,
i.e. the data lines, address lines and command
lines
• The information to be transferred onto the
bus is held in special output registers viz.,
address, command and data registers until it
can be driven onto the bus
• A bus transaction begins with arbitration for
the medium
• Most buses employ a global arbitration
scheme where a processor requesting a
transaction asserts a bus request line and
waits for the corresponding bus grant
• The destination of the transaction is implicit in
the address
• Each module on the bus is configured to
respond to a set of physical addresses
• All modules examine the address and one
responds to the transaction
• If none responds, the bus controller detects the
time-out and aborts the transaction
• Each module includes a set of input registers,
capable of buffering any request to which it might
respond
• Each bus transaction involves a request followed
by a response
• In the case of a read, the response is the data and
an associated completion signal
• For a write it is just the completion
acknowledgement
• In either case, both the source and destination
are informed of the completion of the
transaction
• In split-transaction buses, the response phase
of the transaction may require rearbitration
and may be performed in a different order
than the requests
• Care is required to avoid deadlock with split
transactions because a module on the bus
may be both requesting and servicing
transactions
• The module must continue servicing bus
requests and accept replies while it is
attempting to present its own request
• The bus design ensures that, for any
transaction that might be placed on the bus,
sufficient input buffering exists to accept the
transaction at the destination
• This can be accomplished by providing enough
resources or by adding a negative
acknowledgement signal (NACK)
Issues present in a network transaction
• Protection: As the number of components
becomes larger, the coupling between
components looser and the individual
components more complex, limitations occur
as to how much each component trusts the
others to operate correctly. In a scalable
system, individual components will often
perform checks on the network transaction so
that an errant program or faulty hardware
component cannot corrupt other components
of the system
Format: Most network links are narrow, so the
information associated with a transaction is
transferred as a serial stream. Typical links are a
few (1 to 16) bits wide. The format of the
transaction is dictated by how the information is
serialized onto the link. Thus there is a great deal
of flexibility in this aspect of design. The
information in a network transaction is an
envelope with more information inside. The
envelope includes information pertaining to the
physical network to get the packet from it’s
source to it’s destination port. Some networks
are designed to deliver only fixed-size packets
others can deliver variable-size packets.
Output Buffering: The source must provide
storage to hold information that is to be
serialized onto the link, either in registers,
FIFOs or memory. Since network transactions
are one-way and can potentially be pipelined,
it maybe desirable to provide a queue of
output registers. If the packet format is
variable up to some moderate size, a similar
approach may be adopted where each entry
in the output buffer is of variable size. If a
packet can be quite long, then typically the
output controller contains a buffer of
descriptors, pointing to the data in memory.
Media arbitration: There is no global arbitration
for access to the network and many network
transactions can be initiated simultaneously.
Initiation of the network transaction places an
implicit claim on resources in the
communication path from the source to the
destination as well as on resources at the
destination. These resources are potentially
shared with other transactions. Local
arbitration is performed at the source to
determine whether or not to initiate the
transaction. The resources are allocated
incrementally as the message moves forward.
Destination name and routing:
The source must be able to specify enough
information to cause the transaction to be
routed to the appropriate destination. There
are many variations in how routing is specified
and performed, but basically the source
performs a translation from some logical
name for the destination to some form of
physical address.
• Input buffering: At the destination, the
information in the network transaction must
be transferred from the physical link into some
storage element. This maybe simple registers
or a queue or it may be delivered directly into
memory. The input buffer is in some sense a
shared resource used by many remote
processors.
• Action: The action taken at the destination
may be very simple or complex. In either case,
it may involve initiating a response.
• Completion detection: The source has an
indication that the transaction has been delivered
into the network but usually no indication that it
has arrived at its destination. This completion
must be inferred from a response, an
acknowledgement or some additional
transaction.
• Transaction ordering: In a network the ordering
is quite weak. Some networks ensure that a
sequence of transactions from a given source to a
single destination will be seen in order at the
destination; others will not even provide this
assurance. In either case no node can percieve
the global order.
• Deadlock avoidance: Most modern networks are
deadlock free as long as the modules on the
network continue to accept transactions. Within
the network, this may require restrictions on
permissible routes or other special precautions.
• Delivery guarantees: A fundamental decision in
the design of a scalable network is the behavior
when the destination buffer is full. This is clearly
an issue on an end-to-end basis since it is
necessary for the source to know whether the
destination input buffer is available when it is
attempting to initiate a transaction. It is also an
issue on a link-by-link basis within the network
itself.
Shared Address Space
• Realizing the shared address space
communication abstraction requires a two-
way request-response protocol, as shown in
fig (previous slide)
• A global address is decomposed into a module
number and a local address.
• For a read operation, a request is sent to the
designated module requesting a load of the
desired address and specifying enough
information to allow the result to be returned
to the requestor through a response network
transaction.
• A write is similar, except that the data is
conveyed with the address and command to
the designated module and the response is
merely an acknowledgement to the requestor
that the write has been performed. The
response informs the source that the request
has been received or serviced, depending on
whether it is generated before or after the
remote action.
• A send/receive pair in the message-passing
model is conceptually a one-way transfer
from a source area specified by the source
user process to a destination area specified by
the destination user process.
• In addition, it embodies a pairwise
synchronization event between the two
processes.
• Message passing interface (MPI) distinguishes
the notion of when a call to a send or receive
function returns from when a message
operation completes.
• A synchronous send completes once the
matching receive has executed, the source
data buffer can be reused and the data is
ensured of arriving in the destination receive
buffer.
• A buffered send completes as soon as the
source data buffer can be reused,
independent of whether the matching receive
has been issued; the data may have been
transmitted or it may be buffered somewhere
in the system.
• Buffered send completion is asynchronous
with respect to the receiver process
• A receive completes when the message data is
present in the receive destination buffer.
• A blocking function, send or receive, returns
only after the message operation completes
• A non blocking function returns immediately,
regardless of message completion and
additional calls to a probe function are used to
detect completion
• The protocols are concerned only with
message operation and completion, regardless
of whether the functions are blocking
Scalable multiprocessors
Scalable multiprocessors
Scalable multiprocessors

More Related Content

What's hot

Structure of switches
Structure of switchesStructure of switches
Structure of switches
Anam Sana
 
Wireless communication
Wireless communicationWireless communication
Wireless communication
Mukesh Chinta
 
Modulation techniques
Modulation techniquesModulation techniques
Modulation techniques
Sathish Kumar
 
17 SONET/SDH
17 SONET/SDH17 SONET/SDH
17 SONET/SDH
Ahmar Hashmi
 
Routing protocols in ad hoc network
Routing protocols in ad hoc networkRouting protocols in ad hoc network
Routing protocols in ad hoc network
NIIS Institute of Business Management, Bhubaneswar
 
Computer network switches & their structures
Computer network switches & their structuresComputer network switches & their structures
Computer network switches & their structures
Sweta Kumari Barnwal
 
Routing protocols for ad hoc wireless networks
Routing protocols for ad hoc wireless networks Routing protocols for ad hoc wireless networks
Routing protocols for ad hoc wireless networks
Divya Tiwari
 
WSN NETWORK -MAC PROTOCOLS - Low Duty Cycle Protocols And Wakeup Concepts – ...
WSN NETWORK -MAC PROTOCOLS - Low Duty Cycle Protocols And Wakeup Concepts –  ...WSN NETWORK -MAC PROTOCOLS - Low Duty Cycle Protocols And Wakeup Concepts –  ...
WSN NETWORK -MAC PROTOCOLS - Low Duty Cycle Protocols And Wakeup Concepts – ...
ArunChokkalingam
 
Packet switching
Packet switchingPacket switching
Packet switchingasimnawaz54
 
Data dissemination
Data disseminationData dissemination
Data dissemination
Vikram Nandini
 
Routing ppt
Routing pptRouting ppt
Routing ppt
ArpiSaxena1
 
Destination Sequenced Distance Vector Routing (DSDV)
Destination Sequenced Distance Vector Routing (DSDV)Destination Sequenced Distance Vector Routing (DSDV)
Destination Sequenced Distance Vector Routing (DSDV)
ArunChokkalingam
 
Mobility management in adhoc network
Mobility management in adhoc networkMobility management in adhoc network
Mobility management in adhoc network
Aman Saurabh
 
Mobile Radio Propagations
Mobile Radio PropagationsMobile Radio Propagations
Mobile Radio Propagations
METHODIST COLLEGE OF ENGG & TECH
 
Design Issues In Adhoc Wireless MAC Protocol
Design Issues In Adhoc Wireless MAC ProtocolDesign Issues In Adhoc Wireless MAC Protocol
Design Issues In Adhoc Wireless MAC Protocol
Dushhyant Kumar
 
Large scale path loss 1
Large scale path loss 1Large scale path loss 1
Large scale path loss 1
Vrince Vimal
 
Basics of signals data communication
Basics of signals data communicationBasics of signals data communication
Basics of signals data communication
Syed Bilal Zaidi
 
multiprocessors and multicomputers
 multiprocessors and multicomputers multiprocessors and multicomputers
multiprocessors and multicomputers
Pankaj Kumar Jain
 
program flow mechanisms, advanced computer architecture
program flow mechanisms, advanced computer architectureprogram flow mechanisms, advanced computer architecture
program flow mechanisms, advanced computer architecture
Pankaj Kumar Jain
 

What's hot (20)

Structure of switches
Structure of switchesStructure of switches
Structure of switches
 
Wireless communication
Wireless communicationWireless communication
Wireless communication
 
Modulation techniques
Modulation techniquesModulation techniques
Modulation techniques
 
17 SONET/SDH
17 SONET/SDH17 SONET/SDH
17 SONET/SDH
 
Routing protocols in ad hoc network
Routing protocols in ad hoc networkRouting protocols in ad hoc network
Routing protocols in ad hoc network
 
Computer network switches & their structures
Computer network switches & their structuresComputer network switches & their structures
Computer network switches & their structures
 
Routing protocols for ad hoc wireless networks
Routing protocols for ad hoc wireless networks Routing protocols for ad hoc wireless networks
Routing protocols for ad hoc wireless networks
 
WSN NETWORK -MAC PROTOCOLS - Low Duty Cycle Protocols And Wakeup Concepts – ...
WSN NETWORK -MAC PROTOCOLS - Low Duty Cycle Protocols And Wakeup Concepts –  ...WSN NETWORK -MAC PROTOCOLS - Low Duty Cycle Protocols And Wakeup Concepts –  ...
WSN NETWORK -MAC PROTOCOLS - Low Duty Cycle Protocols And Wakeup Concepts – ...
 
Packet switching
Packet switchingPacket switching
Packet switching
 
Data dissemination
Data disseminationData dissemination
Data dissemination
 
Routing ppt
Routing pptRouting ppt
Routing ppt
 
Destination Sequenced Distance Vector Routing (DSDV)
Destination Sequenced Distance Vector Routing (DSDV)Destination Sequenced Distance Vector Routing (DSDV)
Destination Sequenced Distance Vector Routing (DSDV)
 
Mobility management in adhoc network
Mobility management in adhoc networkMobility management in adhoc network
Mobility management in adhoc network
 
Mobile Radio Propagations
Mobile Radio PropagationsMobile Radio Propagations
Mobile Radio Propagations
 
Design Issues In Adhoc Wireless MAC Protocol
Design Issues In Adhoc Wireless MAC ProtocolDesign Issues In Adhoc Wireless MAC Protocol
Design Issues In Adhoc Wireless MAC Protocol
 
Large scale path loss 1
Large scale path loss 1Large scale path loss 1
Large scale path loss 1
 
Basics of signals data communication
Basics of signals data communicationBasics of signals data communication
Basics of signals data communication
 
multiprocessors and multicomputers
 multiprocessors and multicomputers multiprocessors and multicomputers
multiprocessors and multicomputers
 
program flow mechanisms, advanced computer architecture
program flow mechanisms, advanced computer architectureprogram flow mechanisms, advanced computer architecture
program flow mechanisms, advanced computer architecture
 
Fdm
FdmFdm
Fdm
 

Viewers also liked

Lecture 6.1
Lecture  6.1Lecture  6.1
Lecture 6.1Mr SMAK
 
Open and closed queueing network
Open and closed queueing networkOpen and closed queueing network
Open and closed queueing network
Fahmida Afrin
 
Chapter 17 management (10 th edition) by robbins and coulter
Chapter 17 management (10 th edition) by robbins and coulterChapter 17 management (10 th edition) by robbins and coulter
Chapter 17 management (10 th edition) by robbins and coulterMd. Abul Ala
 
hierarchical bus system
 hierarchical bus system hierarchical bus system
hierarchical bus system
Elvis Jonyo
 

Viewers also liked (6)

Lecture 6.1
Lecture  6.1Lecture  6.1
Lecture 6.1
 
1.prallelism
1.prallelism1.prallelism
1.prallelism
 
Bus interconnection
Bus interconnectionBus interconnection
Bus interconnection
 
Open and closed queueing network
Open and closed queueing networkOpen and closed queueing network
Open and closed queueing network
 
Chapter 17 management (10 th edition) by robbins and coulter
Chapter 17 management (10 th edition) by robbins and coulterChapter 17 management (10 th edition) by robbins and coulter
Chapter 17 management (10 th edition) by robbins and coulter
 
hierarchical bus system
 hierarchical bus system hierarchical bus system
hierarchical bus system
 

Similar to Scalable multiprocessors

DS Unit-4-Communication .pdf
DS Unit-4-Communication .pdfDS Unit-4-Communication .pdf
DS Unit-4-Communication .pdf
SantoshUpreti6
 
Chapter 4
Chapter 4Chapter 4
Chapter 4
ushabarad142
 
Mobile computing unit-5
Mobile computing unit-5Mobile computing unit-5
Mobile computing unit-5
Ramesh Babu
 
Networkprotocolstructurescope 130719081246-phpapp01
Networkprotocolstructurescope 130719081246-phpapp01Networkprotocolstructurescope 130719081246-phpapp01
Networkprotocolstructurescope 130719081246-phpapp01Gaurav Goyal
 
Network protocol structure scope
Network protocol structure scopeNetwork protocol structure scope
Network protocol structure scope
Sanat Maharjan
 
Networking and Internetworking Devices
Networking and Internetworking DevicesNetworking and Internetworking Devices
Networking and Internetworking Devices
21viveksingh
 
Unit2.2
Unit2.2Unit2.2
Osi layer and network protocol
Osi layer and network protocolOsi layer and network protocol
Osi layer and network protocol
Nayan Sarma
 
Point to point interconnect
Point to point interconnectPoint to point interconnect
Point to point interconnect
Kinza Razzaq
 
Transport layer.pptx
Transport layer.pptxTransport layer.pptx
Transport layer.pptx
MohammedAnas871930
 
Osi model
Osi modelOsi model
Osi model
sayyed sabir
 
Routing Protocols
Routing ProtocolsRouting Protocols
08 coms 525 tcpip - tcp 1
08   coms 525 tcpip - tcp 108   coms 525 tcpip - tcp 1
08 coms 525 tcpip - tcp 1
Palanivel Kuppusamy
 
lecture 2.pptx
lecture 2.pptxlecture 2.pptx
lecture 2.pptx
MelkamuEndale1
 
basics of computer network
basics of computer networkbasics of computer network
basics of computer network
Prof Ansari
 
A distributed three hop routing protocol to increase the
A distributed three hop routing protocol to increase theA distributed three hop routing protocol to increase the
A distributed three hop routing protocol to increase the
Kamal Spring
 
CH03 COMBUTER 000000000000000000000.pptx
CH03 COMBUTER 000000000000000000000.pptxCH03 COMBUTER 000000000000000000000.pptx
CH03 COMBUTER 000000000000000000000.pptx
227567
 
Switching techniques
Switching techniquesSwitching techniques
Switching techniques
GLIM Digital
 
Switching techniques
Switching techniquesSwitching techniques
Switching techniquesGupta6Bindu
 

Similar to Scalable multiprocessors (20)

DS Unit-4-Communication .pdf
DS Unit-4-Communication .pdfDS Unit-4-Communication .pdf
DS Unit-4-Communication .pdf
 
Chapter 4
Chapter 4Chapter 4
Chapter 4
 
Mobile computing unit-5
Mobile computing unit-5Mobile computing unit-5
Mobile computing unit-5
 
Networkprotocolstructurescope 130719081246-phpapp01
Networkprotocolstructurescope 130719081246-phpapp01Networkprotocolstructurescope 130719081246-phpapp01
Networkprotocolstructurescope 130719081246-phpapp01
 
Network protocol structure scope
Network protocol structure scopeNetwork protocol structure scope
Network protocol structure scope
 
Networking and Internetworking Devices
Networking and Internetworking DevicesNetworking and Internetworking Devices
Networking and Internetworking Devices
 
Unit2.2
Unit2.2Unit2.2
Unit2.2
 
Osi layer and network protocol
Osi layer and network protocolOsi layer and network protocol
Osi layer and network protocol
 
Point to point interconnect
Point to point interconnectPoint to point interconnect
Point to point interconnect
 
Transport layer.pptx
Transport layer.pptxTransport layer.pptx
Transport layer.pptx
 
Osi model
Osi modelOsi model
Osi model
 
Routing Protocols
Routing ProtocolsRouting Protocols
Routing Protocols
 
3
33
3
 
08 coms 525 tcpip - tcp 1
08   coms 525 tcpip - tcp 108   coms 525 tcpip - tcp 1
08 coms 525 tcpip - tcp 1
 
lecture 2.pptx
lecture 2.pptxlecture 2.pptx
lecture 2.pptx
 
basics of computer network
basics of computer networkbasics of computer network
basics of computer network
 
A distributed three hop routing protocol to increase the
A distributed three hop routing protocol to increase theA distributed three hop routing protocol to increase the
A distributed three hop routing protocol to increase the
 
CH03 COMBUTER 000000000000000000000.pptx
CH03 COMBUTER 000000000000000000000.pptxCH03 COMBUTER 000000000000000000000.pptx
CH03 COMBUTER 000000000000000000000.pptx
 
Switching techniques
Switching techniquesSwitching techniques
Switching techniques
 
Switching techniques
Switching techniquesSwitching techniques
Switching techniques
 

Recently uploaded

How to Create Map Views in the Odoo 17 ERP
How to Create Map Views in the Odoo 17 ERPHow to Create Map Views in the Odoo 17 ERP
How to Create Map Views in the Odoo 17 ERP
Celine George
 
Digital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and ResearchDigital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and Research
Vikramjit Singh
 
PART A. Introduction to Costumer Service
PART A. Introduction to Costumer ServicePART A. Introduction to Costumer Service
PART A. Introduction to Costumer Service
PedroFerreira53928
 
ESC Beyond Borders _From EU to You_ InfoPack general.pdf
ESC Beyond Borders _From EU to You_ InfoPack general.pdfESC Beyond Borders _From EU to You_ InfoPack general.pdf
ESC Beyond Borders _From EU to You_ InfoPack general.pdf
Fundacja Rozwoju Społeczeństwa Przedsiębiorczego
 
MARUTI SUZUKI- A Successful Joint Venture in India.pptx
MARUTI SUZUKI- A Successful Joint Venture in India.pptxMARUTI SUZUKI- A Successful Joint Venture in India.pptx
MARUTI SUZUKI- A Successful Joint Venture in India.pptx
bennyroshan06
 
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCECLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
BhavyaRajput3
 
1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx
JosvitaDsouza2
 
Introduction to Quality Improvement Essentials
Introduction to Quality Improvement EssentialsIntroduction to Quality Improvement Essentials
Introduction to Quality Improvement Essentials
Excellence Foundation for South Sudan
 
Palestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptxPalestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptx
RaedMohamed3
 
Synthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptxSynthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptx
Pavel ( NSTU)
 
Overview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with MechanismOverview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with Mechanism
DeeptiGupta154
 
The Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official PublicationThe Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official Publication
Delapenabediema
 
The Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdfThe Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdf
kaushalkr1407
 
Thesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.pptThesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.ppt
EverAndrsGuerraGuerr
 
Unit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdfUnit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdf
Thiyagu K
 
Language Across the Curriculm LAC B.Ed.
Language Across the  Curriculm LAC B.Ed.Language Across the  Curriculm LAC B.Ed.
Language Across the Curriculm LAC B.Ed.
Atul Kumar Singh
 
Unit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdfUnit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdf
Thiyagu K
 
The geography of Taylor Swift - some ideas
The geography of Taylor Swift - some ideasThe geography of Taylor Swift - some ideas
The geography of Taylor Swift - some ideas
GeoBlogs
 
Model Attribute Check Company Auto Property
Model Attribute  Check Company Auto PropertyModel Attribute  Check Company Auto Property
Model Attribute Check Company Auto Property
Celine George
 
special B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdfspecial B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdf
Special education needs
 

Recently uploaded (20)

How to Create Map Views in the Odoo 17 ERP
How to Create Map Views in the Odoo 17 ERPHow to Create Map Views in the Odoo 17 ERP
How to Create Map Views in the Odoo 17 ERP
 
Digital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and ResearchDigital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and Research
 
PART A. Introduction to Costumer Service
PART A. Introduction to Costumer ServicePART A. Introduction to Costumer Service
PART A. Introduction to Costumer Service
 
ESC Beyond Borders _From EU to You_ InfoPack general.pdf
ESC Beyond Borders _From EU to You_ InfoPack general.pdfESC Beyond Borders _From EU to You_ InfoPack general.pdf
ESC Beyond Borders _From EU to You_ InfoPack general.pdf
 
MARUTI SUZUKI- A Successful Joint Venture in India.pptx
MARUTI SUZUKI- A Successful Joint Venture in India.pptxMARUTI SUZUKI- A Successful Joint Venture in India.pptx
MARUTI SUZUKI- A Successful Joint Venture in India.pptx
 
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCECLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
 
1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx
 
Introduction to Quality Improvement Essentials
Introduction to Quality Improvement EssentialsIntroduction to Quality Improvement Essentials
Introduction to Quality Improvement Essentials
 
Palestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptxPalestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptx
 
Synthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptxSynthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptx
 
Overview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with MechanismOverview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with Mechanism
 
The Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official PublicationThe Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official Publication
 
The Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdfThe Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdf
 
Thesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.pptThesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.ppt
 
Unit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdfUnit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdf
 
Language Across the Curriculm LAC B.Ed.
Language Across the  Curriculm LAC B.Ed.Language Across the  Curriculm LAC B.Ed.
Language Across the Curriculm LAC B.Ed.
 
Unit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdfUnit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdf
 
The geography of Taylor Swift - some ideas
The geography of Taylor Swift - some ideasThe geography of Taylor Swift - some ideas
The geography of Taylor Swift - some ideas
 
Model Attribute Check Company Auto Property
Model Attribute  Check Company Auto PropertyModel Attribute  Check Company Auto Property
Model Attribute Check Company Auto Property
 
special B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdfspecial B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdf
 

Scalable multiprocessors

  • 2. SCALABILITY • Almost all computers allow the capability of the system to be increased in some form, for example by adding memory, I/O cards, disks or upgraded processor(s), but the increase typically has hard limits • A scalable system attempts to avoid inherent design limits on the extent to which resources can be added to the system • Four aspects of scalability: – How does the bandwidth or throughput of the system increase with additional processors? – How does the latency or time per operation increase? – How does the cost of the system increase? – How do we actually package the systems and put them together
  • 3. Bandwidth Scaling • If a large number of processors are to exchange information simultaneously with many other processors or memories, a large number of independent wires must connect them. • Thus scalable machines must be organized in the manner shown in figure (next slide) where a large number of processor modules and memory modules are connected together by independent wires through a large number of switches
  • 4.
  • 5. • A switch may be realized by a bus, a crossbar or even a collection of multiplexers • The number of outputs (or inputs) to the switch is called degree of the switch • Switches are limited in scale but may be interconnected to form large configurations, that is, networks • Controllers are also available to determine which inputs are to be connected to which outputs at each instant in time • A network switch is a more general-purpose device, in which the information presented at the input is enough for the switch controller to determine the proper output without consulting all the nodes • Pairs of modules are connected by routes through network switches
  • 6. • The most common structure for scalable machines is illustrated by the generic architecture shown in fig (next slide) • Here one or more processors are packaged together with one or more memory modules and a communication assist as an easily replicated unit, which is called a node • The intranode switch is typically a high- performance bus
  • 7.
  • 8. In dancehall configuration processing nodes are separated from memory nodes by the network
  • 9. • If the memory modules are on the opposite side of the interconnect, as in fig (previous slide) the network bandwidth requirement scales linearly with the number of processors, even when no communication occurs between processes • Providing adequate bandwidth scaling may not be enough for the computational performance to scale perfectly since the access latency increases with the number of processors • By distributing the memories across the processors, all processes can access local memory with fixed latency, independent of the number of processors; thus the computational performance of the system can scale perfectly
  • 10. The following assumptions are made to achieve scalable bandwidth: • It must be possible to have a very large number of concurrent transactions using different wires • They are initiated independently and without global arbitration • The effects of a transaction (such as changes of state) are directly visible only by the nodes involved in the transaction • The effects may eventually become visible to other nodes as they are propagated by additional transactions • Although it is possible to broadcast information to all nodes, broadcast bandwidth (i,.e. the rate at which broadcasts can be performed) does not increase with the number of nodes
  • 11. Latency Scaling The time to transfer n bytes between two nodes is given by T(n) = Overhead + channel time + routing Delay Where overhead is the processing time in initiating or completing the transfer Channel Time is n/B (where B is the bandwidth of the thinnest channel) Routing Delay is a function f(H,n) of the number of routing steps or hops in the transfer and number of bytes transferred
  • 12. Prob 7.1: Many classic networks are constructed out of fixed-degree switches in a configuration or topology, such that for n nodes the distance from any network input to any network output is log2n and the total number of switches is α n log n for some small constant α. Assuming the overhead as 1µs per message, the link bandwidth is 64 MB/s and the router delay is 200 ns per hop. How much does the time for a 128-byte transfer increase as the machine is scaled from 64 to 1,024 nodes? Solution: At 64 nodes, six hops are required so
  • 13. This increases to 5µs on a 1024-node configuration. Thus, the latency increases by less than 20% with a 16-fold increase in machine size. Even with this small transfer size, a store-and-forward delay would add 2µs(the time to buffer 128 bytes)to the routing delay per hop. Thus the latency would be at 64 nodes and
  • 14. Cost Scaling: • It may be viewed as a fixed cost for the system infrastructure plus an incremental cost of adding processors and memory to the system:
  • 15. Realizing Programming Models • Here we examine what is required to implement programming models on large distributed –memory machines • These machines have been most strongly associated with message-passing programming models • Shared address space programming models have become increasingly important and well represented
  • 16. • The concept of a communication abstraction, which defined the set of communication primitives provided to the user • These could be realized directly in the hardware via system software or through some combination of the two, as shown in fig below
  • 17. • In large-scale parallel machines the programming model is realized in a similar manner, except that the primitive events are transactions across the network, that is, network transactions rather than bus transactions • A network transaction is a one-way transfer of information from an output buffer at the source to an input buffer at the destination that causes some kind of action at the destination, the occurrence of which is not directly visible at the source, as shown in fig (next slide)
  • 18.
  • 19. • Primitive Network Transactions • Before starting a bus transaction, a protection check has been performed as part of the virtual-to-physical address translation • The format of information in a bus transaction is determined by the physical wires of the bus, i.e. the data lines, address lines and command lines • The information to be transferred onto the bus is held in special output registers viz., address, command and data registers until it can be driven onto the bus
  • 20. • A bus transaction begins with arbitration for the medium • Most buses employ a global arbitration scheme where a processor requesting a transaction asserts a bus request line and waits for the corresponding bus grant • The destination of the transaction is implicit in the address • Each module on the bus is configured to respond to a set of physical addresses
  • 21. • All modules examine the address and one responds to the transaction • If none responds, the bus controller detects the time-out and aborts the transaction • Each module includes a set of input registers, capable of buffering any request to which it might respond • Each bus transaction involves a request followed by a response • In the case of a read, the response is the data and an associated completion signal • For a write it is just the completion acknowledgement
  • 22. • In either case, both the source and destination are informed of the completion of the transaction • In split-transaction buses, the response phase of the transaction may require rearbitration and may be performed in a different order than the requests • Care is required to avoid deadlock with split transactions because a module on the bus may be both requesting and servicing transactions
  • 23. • The module must continue servicing bus requests and accept replies while it is attempting to present its own request • The bus design ensures that, for any transaction that might be placed on the bus, sufficient input buffering exists to accept the transaction at the destination • This can be accomplished by providing enough resources or by adding a negative acknowledgement signal (NACK)
  • 24. Issues present in a network transaction • Protection: As the number of components becomes larger, the coupling between components looser and the individual components more complex, limitations occur as to how much each component trusts the others to operate correctly. In a scalable system, individual components will often perform checks on the network transaction so that an errant program or faulty hardware component cannot corrupt other components of the system
  • 25. Format: Most network links are narrow, so the information associated with a transaction is transferred as a serial stream. Typical links are a few (1 to 16) bits wide. The format of the transaction is dictated by how the information is serialized onto the link. Thus there is a great deal of flexibility in this aspect of design. The information in a network transaction is an envelope with more information inside. The envelope includes information pertaining to the physical network to get the packet from it’s source to it’s destination port. Some networks are designed to deliver only fixed-size packets others can deliver variable-size packets.
  • 26. Output Buffering: The source must provide storage to hold information that is to be serialized onto the link, either in registers, FIFOs or memory. Since network transactions are one-way and can potentially be pipelined, it maybe desirable to provide a queue of output registers. If the packet format is variable up to some moderate size, a similar approach may be adopted where each entry in the output buffer is of variable size. If a packet can be quite long, then typically the output controller contains a buffer of descriptors, pointing to the data in memory.
  • 27. Media arbitration: There is no global arbitration for access to the network and many network transactions can be initiated simultaneously. Initiation of the network transaction places an implicit claim on resources in the communication path from the source to the destination as well as on resources at the destination. These resources are potentially shared with other transactions. Local arbitration is performed at the source to determine whether or not to initiate the transaction. The resources are allocated incrementally as the message moves forward.
  • 28. Destination name and routing: The source must be able to specify enough information to cause the transaction to be routed to the appropriate destination. There are many variations in how routing is specified and performed, but basically the source performs a translation from some logical name for the destination to some form of physical address.
  • 29. • Input buffering: At the destination, the information in the network transaction must be transferred from the physical link into some storage element. This maybe simple registers or a queue or it may be delivered directly into memory. The input buffer is in some sense a shared resource used by many remote processors. • Action: The action taken at the destination may be very simple or complex. In either case, it may involve initiating a response.
  • 30. • Completion detection: The source has an indication that the transaction has been delivered into the network but usually no indication that it has arrived at its destination. This completion must be inferred from a response, an acknowledgement or some additional transaction. • Transaction ordering: In a network the ordering is quite weak. Some networks ensure that a sequence of transactions from a given source to a single destination will be seen in order at the destination; others will not even provide this assurance. In either case no node can percieve the global order.
  • 31. • Deadlock avoidance: Most modern networks are deadlock free as long as the modules on the network continue to accept transactions. Within the network, this may require restrictions on permissible routes or other special precautions. • Delivery guarantees: A fundamental decision in the design of a scalable network is the behavior when the destination buffer is full. This is clearly an issue on an end-to-end basis since it is necessary for the source to know whether the destination input buffer is available when it is attempting to initiate a transaction. It is also an issue on a link-by-link basis within the network itself.
  • 33. • Realizing the shared address space communication abstraction requires a two- way request-response protocol, as shown in fig (previous slide) • A global address is decomposed into a module number and a local address. • For a read operation, a request is sent to the designated module requesting a load of the desired address and specifying enough information to allow the result to be returned to the requestor through a response network transaction.
  • 34. • A write is similar, except that the data is conveyed with the address and command to the designated module and the response is merely an acknowledgement to the requestor that the write has been performed. The response informs the source that the request has been received or serviced, depending on whether it is generated before or after the remote action.
  • 35.
  • 36. • A send/receive pair in the message-passing model is conceptually a one-way transfer from a source area specified by the source user process to a destination area specified by the destination user process. • In addition, it embodies a pairwise synchronization event between the two processes. • Message passing interface (MPI) distinguishes the notion of when a call to a send or receive function returns from when a message operation completes.
  • 37. • A synchronous send completes once the matching receive has executed, the source data buffer can be reused and the data is ensured of arriving in the destination receive buffer. • A buffered send completes as soon as the source data buffer can be reused, independent of whether the matching receive has been issued; the data may have been transmitted or it may be buffered somewhere in the system.
  • 38. • Buffered send completion is asynchronous with respect to the receiver process • A receive completes when the message data is present in the receive destination buffer. • A blocking function, send or receive, returns only after the message operation completes • A non blocking function returns immediately, regardless of message completion and additional calls to a probe function are used to detect completion
  • 39. • The protocols are concerned only with message operation and completion, regardless of whether the functions are blocking