SlideShare a Scribd company logo
1 of 42
Scalable
Multiprocessors
SCALABILITY
• Almost all computers allow the capability of the system to be
increased in some form, for example by adding memory, I/O
cards, disks or upgraded processor(s), but the increase
typically has hard limits
• A scalable system attempts to avoid inherent design limits on
the extent to which resources can be added to the system
• Four aspects of scalability:
– How does the bandwidth or throughput of the system increase with additional
processors?
– How does the latency or time per operation increase?
– How does the cost of the system increase?
– How do we actually package the systems and put them together
Bandwidth Scaling
• If a large number of processors are to exchange
information simultaneously with many other
processors or memories, a large number of
independent wires must connect them.
• Thus scalable machines must be organized in
the manner shown in figure (next slide) where
a large number of processor modules and
memory modules are connected together by
independent wires through a large number of
switches
• A switch may be realized by a bus, a crossbar or even a collection
of multiplexers
• The number of outputs (or inputs) to the switch is called degree of
the switch
• Switches are limited in scale but may be interconnected to form
large configurations, that is, networks
• Controllers are also available to determine which inputs are to be
connected to which outputs at each instant in time
• A network switch is a more general-purpose device, in which the
information presented at the input is enough for the switch
controller to determine the proper output without consulting all
the nodes
• Pairs of modules are connected by routes through network switches
• The most common structure for scalable
machines is illustrated by the generic
architecture shown in fig (next slide)
• Here one or more processors are packaged
together with one or more memory modules
and a communication assist as an easily
replicated unit, which is called a node
• The intranode switch is typically a high-
performance bus
In dancehall configuration processing nodes are separated from
memory nodes by the network
• If the memory modules are on the opposite side
of the interconnect, as in fig (previous slide) the
network bandwidth requirement scales linearly
with the number of processors, even when no
communication occurs between processes
• Providing adequate bandwidth scaling may not
be enough for the computational performance to
scale perfectly since the access latency increases
with the number of processors
• By distributing the memories across the
processors, all processes can access local memory
with fixed latency, independent of the number of
processors; thus the computational performance
of the system can scale perfectly
The following assumptions are made to achieve scalable
bandwidth:
• It must be possible to have a very large number of
concurrent transactions using different wires
• They are initiated independently and without global
arbitration
• The effects of a transaction (such as changes of state)
are directly visible only by the nodes involved in the
transaction
• The effects may eventually become visible to other
nodes as they are propagated by additional
transactions
• Although it is possible to broadcast information to all
nodes, broadcast bandwidth (i,.e. the rate at which
broadcasts can be performed) does not increase with
the number of nodes
Latency Scaling
The time to transfer n bytes between two nodes
is given by
T(n) = Overhead + channel time + routing Delay
Where overhead is the processing time in
initiating or completing the transfer
Channel Time is n/B (where B is the bandwidth
of the thinnest channel)
Routing Delay is a function f(H,n) of the number
of routing steps or hops in the transfer and
number of bytes transferred
Prob 7.1: Many classic networks are
constructed out of fixed-degree switches in a
configuration or topology, such that for n nodes
the distance from any network input to any
network output is log2n and the total number of
switches is α n log n for some small constant α.
Assuming the overhead as 1µs per message, the
link bandwidth is 64 MB/s and the router delay
is 200 ns per hop. How much does the time for
a 128-byte transfer increase as the machine is
scaled from 64 to 1,024 nodes?
Solution: At 64 nodes, six hops are required so
This increases to 5µs on a 1024-node
configuration. Thus, the latency increases by
less than 20% with a 16-fold increase in
machine size. Even with this small transfer
size, a store-and-forward delay would add
2µs(the time to buffer 128 bytes)to the
routing delay per hop. Thus the latency would
be
at 64 nodes and
Cost Scaling:
• It may be viewed as a fixed cost for the system
infrastructure plus an incremental cost of
adding processors and memory to the system:
Realizing Programming Models
• Here we examine what is required to
implement programming models on large
distributed –memory machines
• These machines have been most strongly
associated with message-passing
programming models
• Shared address space programming models
have become increasingly important and
well represented
• The concept of a communication abstraction, which defined
the set of communication primitives provided to the user
• These could be realized directly in the hardware via system
software or through some combination of the two, as shown
in fig below
• In large-scale parallel machines the
programming model is realized in a similar
manner, except that the primitive events are
transactions across the network, that is,
network transactions rather than bus
transactions
• A network transaction is a one-way transfer of
information from an output buffer at the
source to an input buffer at the destination
that causes some kind of action at the
destination, the occurrence of which is not
directly visible at the source, as shown in fig
(next slide)
• Primitive Network Transactions
• Before starting a bus transaction, a protection
check has been performed as part of the
virtual-to-physical address translation
• The format of information in a bus transaction
is determined by the physical wires of the bus,
i.e. the data lines, address lines and command
lines
• The information to be transferred onto the
bus is held in special output registers viz.,
address, command and data registers until it
can be driven onto the bus
• A bus transaction begins with arbitration for
the medium
• Most buses employ a global arbitration
scheme where a processor requesting a
transaction asserts a bus request line and
waits for the corresponding bus grant
• The destination of the transaction is implicit in
the address
• Each module on the bus is configured to
respond to a set of physical addresses
• All modules examine the address and one
responds to the transaction
• If none responds, the bus controller detects the
time-out and aborts the transaction
• Each module includes a set of input registers,
capable of buffering any request to which it might
respond
• Each bus transaction involves a request followed
by a response
• In the case of a read, the response is the data and
an associated completion signal
• For a write it is just the completion
acknowledgement
• In either case, both the source and destination
are informed of the completion of the
transaction
• In split-transaction buses, the response phase
of the transaction may require rearbitration
and may be performed in a different order
than the requests
• Care is required to avoid deadlock with split
transactions because a module on the bus
may be both requesting and servicing
transactions
• The module must continue servicing bus
requests and accept replies while it is
attempting to present its own request
• The bus design ensures that, for any
transaction that might be placed on the bus,
sufficient input buffering exists to accept the
transaction at the destination
• This can be accomplished by providing enough
resources or by adding a negative
acknowledgement signal (NACK)
Issues present in a network transaction
• Protection: As the number of components
becomes larger, the coupling between
components looser and the individual
components more complex, limitations occur
as to how much each component trusts the
others to operate correctly. In a scalable
system, individual components will often
perform checks on the network transaction so
that an errant program or faulty hardware
component cannot corrupt other components
of the system
Format: Most network links are narrow, so the
information associated with a transaction is
transferred as a serial stream. Typical links are a
few (1 to 16) bits wide. The format of the
transaction is dictated by how the information is
serialized onto the link. Thus there is a great deal
of flexibility in this aspect of design. The
information in a network transaction is an
envelope with more information inside. The
envelope includes information pertaining to the
physical network to get the packet from it’s
source to it’s destination port. Some networks
are designed to deliver only fixed-size packets
others can deliver variable-size packets.
Output Buffering: The source must provide
storage to hold information that is to be
serialized onto the link, either in registers,
FIFOs or memory. Since network transactions
are one-way and can potentially be pipelined,
it maybe desirable to provide a queue of
output registers. If the packet format is
variable up to some moderate size, a similar
approach may be adopted where each entry
in the output buffer is of variable size. If a
packet can be quite long, then typically the
output controller contains a buffer of
descriptors, pointing to the data in memory.
Media arbitration: There is no global arbitration
for access to the network and many network
transactions can be initiated simultaneously.
Initiation of the network transaction places an
implicit claim on resources in the
communication path from the source to the
destination as well as on resources at the
destination. These resources are potentially
shared with other transactions. Local
arbitration is performed at the source to
determine whether or not to initiate the
transaction. The resources are allocated
incrementally as the message moves forward.
Destination name and routing:
The source must be able to specify enough
information to cause the transaction to be
routed to the appropriate destination. There
are many variations in how routing is specified
and performed, but basically the source
performs a translation from some logical
name for the destination to some form of
physical address.
• Input buffering: At the destination, the
information in the network transaction must
be transferred from the physical link into some
storage element. This maybe simple registers
or a queue or it may be delivered directly into
memory. The input buffer is in some sense a
shared resource used by many remote
processors.
• Action: The action taken at the destination
may be very simple or complex. In either case,
it may involve initiating a response.
• Completion detection: The source has an
indication that the transaction has been delivered
into the network but usually no indication that it
has arrived at its destination. This completion
must be inferred from a response, an
acknowledgement or some additional
transaction.
• Transaction ordering: In a network the ordering
is quite weak. Some networks ensure that a
sequence of transactions from a given source to a
single destination will be seen in order at the
destination; others will not even provide this
assurance. In either case no node can percieve
the global order.
• Deadlock avoidance: Most modern networks are
deadlock free as long as the modules on the
network continue to accept transactions. Within
the network, this may require restrictions on
permissible routes or other special precautions.
• Delivery guarantees: A fundamental decision in
the design of a scalable network is the behavior
when the destination buffer is full. This is clearly
an issue on an end-to-end basis since it is
necessary for the source to know whether the
destination input buffer is available when it is
attempting to initiate a transaction. It is also an
issue on a link-by-link basis within the network
itself.
Shared Address Space
• Realizing the shared address space
communication abstraction requires a two-
way request-response protocol, as shown in
fig (previous slide)
• A global address is decomposed into a module
number and a local address.
• For a read operation, a request is sent to the
designated module requesting a load of the
desired address and specifying enough
information to allow the result to be returned
to the requestor through a response network
transaction.
• A write is similar, except that the data is
conveyed with the address and command to
the designated module and the response is
merely an acknowledgement to the requestor
that the write has been performed. The
response informs the source that the request
has been received or serviced, depending on
whether it is generated before or after the
remote action.
• A send/receive pair in the message-passing
model is conceptually a one-way transfer
from a source area specified by the source
user process to a destination area specified by
the destination user process.
• In addition, it embodies a pairwise
synchronization event between the two
processes.
• Message passing interface (MPI) distinguishes
the notion of when a call to a send or receive
function returns from when a message
operation completes.
• A synchronous send completes once the
matching receive has executed, the source
data buffer can be reused and the data is
ensured of arriving in the destination receive
buffer.
• A buffered send completes as soon as the
source data buffer can be reused,
independent of whether the matching receive
has been issued; the data may have been
transmitted or it may be buffered somewhere
in the system.
• Buffered send completion is asynchronous
with respect to the receiver process
• A receive completes when the message data is
present in the receive destination buffer.
• A blocking function, send or receive, returns
only after the message operation completes
• A non blocking function returns immediately,
regardless of message completion and
additional calls to a probe function are used to
detect completion
• The protocols are concerned only with
message operation and completion, regardless
of whether the functions are blocking
Scalable multiprocessors
Scalable multiprocessors
Scalable multiprocessors

More Related Content

What's hot

What's hot (20)

Cyclic Redundancy Check
Cyclic Redundancy CheckCyclic Redundancy Check
Cyclic Redundancy Check
 
Basic ops concept of comp
Basic ops  concept of compBasic ops  concept of comp
Basic ops concept of comp
 
Computer architecture virtual memory
Computer architecture virtual memoryComputer architecture virtual memory
Computer architecture virtual memory
 
Cache memory
Cache memoryCache memory
Cache memory
 
Ethernet protocol
Ethernet protocolEthernet protocol
Ethernet protocol
 
Cache memory principles
Cache memory principlesCache memory principles
Cache memory principles
 
Memory management
Memory managementMemory management
Memory management
 
Unit 4 data link layer
Unit 4 data link layerUnit 4 data link layer
Unit 4 data link layer
 
Cs8591 Computer Networks
Cs8591 Computer NetworksCs8591 Computer Networks
Cs8591 Computer Networks
 
3.Medium Access Control
3.Medium Access Control3.Medium Access Control
3.Medium Access Control
 
8086 memory segmentation
8086 memory segmentation8086 memory segmentation
8086 memory segmentation
 
Ch2 network models
Ch2 network modelsCh2 network models
Ch2 network models
 
Chapter 3
Chapter 3Chapter 3
Chapter 3
 
Pipelining and ILP (Instruction Level Parallelism)
Pipelining and ILP (Instruction Level Parallelism) Pipelining and ILP (Instruction Level Parallelism)
Pipelining and ILP (Instruction Level Parallelism)
 
Congestion control
Congestion controlCongestion control
Congestion control
 
Ethernet
EthernetEthernet
Ethernet
 
Lecture 3 threads
Lecture 3   threadsLecture 3   threads
Lecture 3 threads
 
HIGH SPEED NETWORKS
HIGH SPEED NETWORKSHIGH SPEED NETWORKS
HIGH SPEED NETWORKS
 
Dcn ppt by roma
Dcn ppt by romaDcn ppt by roma
Dcn ppt by roma
 
Semiconductor memory
Semiconductor memorySemiconductor memory
Semiconductor memory
 

Viewers also liked

Lecture 6.1
Lecture  6.1Lecture  6.1
Lecture 6.1Mr SMAK
 
Open and closed queueing network
Open and closed queueing networkOpen and closed queueing network
Open and closed queueing networkFahmida Afrin
 
Chapter 17 management (10 th edition) by robbins and coulter
Chapter 17 management (10 th edition) by robbins and coulterChapter 17 management (10 th edition) by robbins and coulter
Chapter 17 management (10 th edition) by robbins and coulterMd. Abul Ala
 
hierarchical bus system
 hierarchical bus system hierarchical bus system
hierarchical bus systemElvis Jonyo
 

Viewers also liked (6)

Lecture 6.1
Lecture  6.1Lecture  6.1
Lecture 6.1
 
1.prallelism
1.prallelism1.prallelism
1.prallelism
 
Bus interconnection
Bus interconnectionBus interconnection
Bus interconnection
 
Open and closed queueing network
Open and closed queueing networkOpen and closed queueing network
Open and closed queueing network
 
Chapter 17 management (10 th edition) by robbins and coulter
Chapter 17 management (10 th edition) by robbins and coulterChapter 17 management (10 th edition) by robbins and coulter
Chapter 17 management (10 th edition) by robbins and coulter
 
hierarchical bus system
 hierarchical bus system hierarchical bus system
hierarchical bus system
 

Similar to Scalable multiprocessors

DS Unit-4-Communication .pdf
DS Unit-4-Communication .pdfDS Unit-4-Communication .pdf
DS Unit-4-Communication .pdfSantoshUpreti6
 
Mobile computing unit-5
Mobile computing unit-5Mobile computing unit-5
Mobile computing unit-5Ramesh Babu
 
Networkprotocolstructurescope 130719081246-phpapp01
Networkprotocolstructurescope 130719081246-phpapp01Networkprotocolstructurescope 130719081246-phpapp01
Networkprotocolstructurescope 130719081246-phpapp01Gaurav Goyal
 
Network protocol structure scope
Network protocol structure scopeNetwork protocol structure scope
Network protocol structure scopeSanat Maharjan
 
Networking and Internetworking Devices
Networking and Internetworking DevicesNetworking and Internetworking Devices
Networking and Internetworking Devices21viveksingh
 
Osi layer and network protocol
Osi layer and network protocolOsi layer and network protocol
Osi layer and network protocolNayan Sarma
 
Point to point interconnect
Point to point interconnectPoint to point interconnect
Point to point interconnectKinza Razzaq
 
basics of computer network
basics of computer networkbasics of computer network
basics of computer networkProf Ansari
 
A distributed three hop routing protocol to increase the
A distributed three hop routing protocol to increase theA distributed three hop routing protocol to increase the
A distributed three hop routing protocol to increase theKamal Spring
 
CH03 COMBUTER 000000000000000000000.pptx
CH03 COMBUTER 000000000000000000000.pptxCH03 COMBUTER 000000000000000000000.pptx
CH03 COMBUTER 000000000000000000000.pptx227567
 
Switching techniques
Switching techniquesSwitching techniques
Switching techniquesGLIM Digital
 
Switching techniques
Switching techniquesSwitching techniques
Switching techniquesGupta6Bindu
 

Similar to Scalable multiprocessors (20)

DS Unit-4-Communication .pdf
DS Unit-4-Communication .pdfDS Unit-4-Communication .pdf
DS Unit-4-Communication .pdf
 
Chapter 4
Chapter 4Chapter 4
Chapter 4
 
Mobile computing unit-5
Mobile computing unit-5Mobile computing unit-5
Mobile computing unit-5
 
Networkprotocolstructurescope 130719081246-phpapp01
Networkprotocolstructurescope 130719081246-phpapp01Networkprotocolstructurescope 130719081246-phpapp01
Networkprotocolstructurescope 130719081246-phpapp01
 
Network protocol structure scope
Network protocol structure scopeNetwork protocol structure scope
Network protocol structure scope
 
Networking and Internetworking Devices
Networking and Internetworking DevicesNetworking and Internetworking Devices
Networking and Internetworking Devices
 
Unit2.2
Unit2.2Unit2.2
Unit2.2
 
Osi layer and network protocol
Osi layer and network protocolOsi layer and network protocol
Osi layer and network protocol
 
Point to point interconnect
Point to point interconnectPoint to point interconnect
Point to point interconnect
 
Transport layer.pptx
Transport layer.pptxTransport layer.pptx
Transport layer.pptx
 
Osi model
Osi modelOsi model
Osi model
 
Routing Protocols
Routing ProtocolsRouting Protocols
Routing Protocols
 
3
33
3
 
08 coms 525 tcpip - tcp 1
08   coms 525 tcpip - tcp 108   coms 525 tcpip - tcp 1
08 coms 525 tcpip - tcp 1
 
lecture 2.pptx
lecture 2.pptxlecture 2.pptx
lecture 2.pptx
 
basics of computer network
basics of computer networkbasics of computer network
basics of computer network
 
A distributed three hop routing protocol to increase the
A distributed three hop routing protocol to increase theA distributed three hop routing protocol to increase the
A distributed three hop routing protocol to increase the
 
CH03 COMBUTER 000000000000000000000.pptx
CH03 COMBUTER 000000000000000000000.pptxCH03 COMBUTER 000000000000000000000.pptx
CH03 COMBUTER 000000000000000000000.pptx
 
Switching techniques
Switching techniquesSwitching techniques
Switching techniques
 
Switching techniques
Switching techniquesSwitching techniques
Switching techniques
 

Recently uploaded

Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...fonyou31
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docxPoojaSen20
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...Sapna Thakur
 

Recently uploaded (20)

Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docx
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 

Scalable multiprocessors

  • 2. SCALABILITY • Almost all computers allow the capability of the system to be increased in some form, for example by adding memory, I/O cards, disks or upgraded processor(s), but the increase typically has hard limits • A scalable system attempts to avoid inherent design limits on the extent to which resources can be added to the system • Four aspects of scalability: – How does the bandwidth or throughput of the system increase with additional processors? – How does the latency or time per operation increase? – How does the cost of the system increase? – How do we actually package the systems and put them together
  • 3. Bandwidth Scaling • If a large number of processors are to exchange information simultaneously with many other processors or memories, a large number of independent wires must connect them. • Thus scalable machines must be organized in the manner shown in figure (next slide) where a large number of processor modules and memory modules are connected together by independent wires through a large number of switches
  • 4.
  • 5. • A switch may be realized by a bus, a crossbar or even a collection of multiplexers • The number of outputs (or inputs) to the switch is called degree of the switch • Switches are limited in scale but may be interconnected to form large configurations, that is, networks • Controllers are also available to determine which inputs are to be connected to which outputs at each instant in time • A network switch is a more general-purpose device, in which the information presented at the input is enough for the switch controller to determine the proper output without consulting all the nodes • Pairs of modules are connected by routes through network switches
  • 6. • The most common structure for scalable machines is illustrated by the generic architecture shown in fig (next slide) • Here one or more processors are packaged together with one or more memory modules and a communication assist as an easily replicated unit, which is called a node • The intranode switch is typically a high- performance bus
  • 7.
  • 8. In dancehall configuration processing nodes are separated from memory nodes by the network
  • 9. • If the memory modules are on the opposite side of the interconnect, as in fig (previous slide) the network bandwidth requirement scales linearly with the number of processors, even when no communication occurs between processes • Providing adequate bandwidth scaling may not be enough for the computational performance to scale perfectly since the access latency increases with the number of processors • By distributing the memories across the processors, all processes can access local memory with fixed latency, independent of the number of processors; thus the computational performance of the system can scale perfectly
  • 10. The following assumptions are made to achieve scalable bandwidth: • It must be possible to have a very large number of concurrent transactions using different wires • They are initiated independently and without global arbitration • The effects of a transaction (such as changes of state) are directly visible only by the nodes involved in the transaction • The effects may eventually become visible to other nodes as they are propagated by additional transactions • Although it is possible to broadcast information to all nodes, broadcast bandwidth (i,.e. the rate at which broadcasts can be performed) does not increase with the number of nodes
  • 11. Latency Scaling The time to transfer n bytes between two nodes is given by T(n) = Overhead + channel time + routing Delay Where overhead is the processing time in initiating or completing the transfer Channel Time is n/B (where B is the bandwidth of the thinnest channel) Routing Delay is a function f(H,n) of the number of routing steps or hops in the transfer and number of bytes transferred
  • 12. Prob 7.1: Many classic networks are constructed out of fixed-degree switches in a configuration or topology, such that for n nodes the distance from any network input to any network output is log2n and the total number of switches is α n log n for some small constant α. Assuming the overhead as 1µs per message, the link bandwidth is 64 MB/s and the router delay is 200 ns per hop. How much does the time for a 128-byte transfer increase as the machine is scaled from 64 to 1,024 nodes? Solution: At 64 nodes, six hops are required so
  • 13. This increases to 5µs on a 1024-node configuration. Thus, the latency increases by less than 20% with a 16-fold increase in machine size. Even with this small transfer size, a store-and-forward delay would add 2µs(the time to buffer 128 bytes)to the routing delay per hop. Thus the latency would be at 64 nodes and
  • 14. Cost Scaling: • It may be viewed as a fixed cost for the system infrastructure plus an incremental cost of adding processors and memory to the system:
  • 15. Realizing Programming Models • Here we examine what is required to implement programming models on large distributed –memory machines • These machines have been most strongly associated with message-passing programming models • Shared address space programming models have become increasingly important and well represented
  • 16. • The concept of a communication abstraction, which defined the set of communication primitives provided to the user • These could be realized directly in the hardware via system software or through some combination of the two, as shown in fig below
  • 17. • In large-scale parallel machines the programming model is realized in a similar manner, except that the primitive events are transactions across the network, that is, network transactions rather than bus transactions • A network transaction is a one-way transfer of information from an output buffer at the source to an input buffer at the destination that causes some kind of action at the destination, the occurrence of which is not directly visible at the source, as shown in fig (next slide)
  • 18.
  • 19. • Primitive Network Transactions • Before starting a bus transaction, a protection check has been performed as part of the virtual-to-physical address translation • The format of information in a bus transaction is determined by the physical wires of the bus, i.e. the data lines, address lines and command lines • The information to be transferred onto the bus is held in special output registers viz., address, command and data registers until it can be driven onto the bus
  • 20. • A bus transaction begins with arbitration for the medium • Most buses employ a global arbitration scheme where a processor requesting a transaction asserts a bus request line and waits for the corresponding bus grant • The destination of the transaction is implicit in the address • Each module on the bus is configured to respond to a set of physical addresses
  • 21. • All modules examine the address and one responds to the transaction • If none responds, the bus controller detects the time-out and aborts the transaction • Each module includes a set of input registers, capable of buffering any request to which it might respond • Each bus transaction involves a request followed by a response • In the case of a read, the response is the data and an associated completion signal • For a write it is just the completion acknowledgement
  • 22. • In either case, both the source and destination are informed of the completion of the transaction • In split-transaction buses, the response phase of the transaction may require rearbitration and may be performed in a different order than the requests • Care is required to avoid deadlock with split transactions because a module on the bus may be both requesting and servicing transactions
  • 23. • The module must continue servicing bus requests and accept replies while it is attempting to present its own request • The bus design ensures that, for any transaction that might be placed on the bus, sufficient input buffering exists to accept the transaction at the destination • This can be accomplished by providing enough resources or by adding a negative acknowledgement signal (NACK)
  • 24. Issues present in a network transaction • Protection: As the number of components becomes larger, the coupling between components looser and the individual components more complex, limitations occur as to how much each component trusts the others to operate correctly. In a scalable system, individual components will often perform checks on the network transaction so that an errant program or faulty hardware component cannot corrupt other components of the system
  • 25. Format: Most network links are narrow, so the information associated with a transaction is transferred as a serial stream. Typical links are a few (1 to 16) bits wide. The format of the transaction is dictated by how the information is serialized onto the link. Thus there is a great deal of flexibility in this aspect of design. The information in a network transaction is an envelope with more information inside. The envelope includes information pertaining to the physical network to get the packet from it’s source to it’s destination port. Some networks are designed to deliver only fixed-size packets others can deliver variable-size packets.
  • 26. Output Buffering: The source must provide storage to hold information that is to be serialized onto the link, either in registers, FIFOs or memory. Since network transactions are one-way and can potentially be pipelined, it maybe desirable to provide a queue of output registers. If the packet format is variable up to some moderate size, a similar approach may be adopted where each entry in the output buffer is of variable size. If a packet can be quite long, then typically the output controller contains a buffer of descriptors, pointing to the data in memory.
  • 27. Media arbitration: There is no global arbitration for access to the network and many network transactions can be initiated simultaneously. Initiation of the network transaction places an implicit claim on resources in the communication path from the source to the destination as well as on resources at the destination. These resources are potentially shared with other transactions. Local arbitration is performed at the source to determine whether or not to initiate the transaction. The resources are allocated incrementally as the message moves forward.
  • 28. Destination name and routing: The source must be able to specify enough information to cause the transaction to be routed to the appropriate destination. There are many variations in how routing is specified and performed, but basically the source performs a translation from some logical name for the destination to some form of physical address.
  • 29. • Input buffering: At the destination, the information in the network transaction must be transferred from the physical link into some storage element. This maybe simple registers or a queue or it may be delivered directly into memory. The input buffer is in some sense a shared resource used by many remote processors. • Action: The action taken at the destination may be very simple or complex. In either case, it may involve initiating a response.
  • 30. • Completion detection: The source has an indication that the transaction has been delivered into the network but usually no indication that it has arrived at its destination. This completion must be inferred from a response, an acknowledgement or some additional transaction. • Transaction ordering: In a network the ordering is quite weak. Some networks ensure that a sequence of transactions from a given source to a single destination will be seen in order at the destination; others will not even provide this assurance. In either case no node can percieve the global order.
  • 31. • Deadlock avoidance: Most modern networks are deadlock free as long as the modules on the network continue to accept transactions. Within the network, this may require restrictions on permissible routes or other special precautions. • Delivery guarantees: A fundamental decision in the design of a scalable network is the behavior when the destination buffer is full. This is clearly an issue on an end-to-end basis since it is necessary for the source to know whether the destination input buffer is available when it is attempting to initiate a transaction. It is also an issue on a link-by-link basis within the network itself.
  • 33. • Realizing the shared address space communication abstraction requires a two- way request-response protocol, as shown in fig (previous slide) • A global address is decomposed into a module number and a local address. • For a read operation, a request is sent to the designated module requesting a load of the desired address and specifying enough information to allow the result to be returned to the requestor through a response network transaction.
  • 34. • A write is similar, except that the data is conveyed with the address and command to the designated module and the response is merely an acknowledgement to the requestor that the write has been performed. The response informs the source that the request has been received or serviced, depending on whether it is generated before or after the remote action.
  • 35.
  • 36. • A send/receive pair in the message-passing model is conceptually a one-way transfer from a source area specified by the source user process to a destination area specified by the destination user process. • In addition, it embodies a pairwise synchronization event between the two processes. • Message passing interface (MPI) distinguishes the notion of when a call to a send or receive function returns from when a message operation completes.
  • 37. • A synchronous send completes once the matching receive has executed, the source data buffer can be reused and the data is ensured of arriving in the destination receive buffer. • A buffered send completes as soon as the source data buffer can be reused, independent of whether the matching receive has been issued; the data may have been transmitted or it may be buffered somewhere in the system.
  • 38. • Buffered send completion is asynchronous with respect to the receiver process • A receive completes when the message data is present in the receive destination buffer. • A blocking function, send or receive, returns only after the message operation completes • A non blocking function returns immediately, regardless of message completion and additional calls to a probe function are used to detect completion
  • 39. • The protocols are concerned only with message operation and completion, regardless of whether the functions are blocking