SlideShare a Scribd company logo
Parallel Programming
Platforms
Physical Organization of Parallel
Platforms
Physical Organization of Parallel Platforms
• we discuss the physical architecture of parallel machines
• We start with an ideal architecture
• Then discuss some conventional architectures
Architecture of an Ideal Parallel Computer
• Parallel Random Access Machine (PRAM)
• A natural extension of the serial model of computation (the Random Access
Machine, or RAM) consists of p processors and a global memory of
unbounded size that is uniformly accessible to all processors.
• All processors access the same address space.
• Processors share a common clock but may execute different instructions in
each cycle.
PRAM
PRAMs can be divided into four subclasses (based on how simultaneous
memory accesses are handled)
• Exclusive-read, exclusive-write (EREW) PRAM
• Every memory cell can be read or written to by only one processor at a time
• Concurrent-read, exclusive-write (CREW) PRAM
• Multiple processors can read a memory cell but only one can write at a time
• Exclusive-read, concurrent-write (ERCW) PRAM
• Multiple write accesses are allowed to a memory location, but multiple read
accesses are serialized
• Concurrent-read, concurrent-write (CRCW) PRAM
• This class allows multiple read and write accesses to a common memory
location. This is the most powerful PRAM model
PRAM
• Allowing concurrent read access does not create any semantic
discrepancies in the program.
• Concurrent write access to a memory location requires arbitration
• Several protocols are used to resolve concurrent writes
• Common  if all the values that the processors are attempting to write are
identical
• Arbitrary  an arbitrary processor is allowed to proceed with the write
operation and the rest fail
• Priority  processors are organized into a predefined prioritized list
• Sum  the sum of all the quantities is written
Interconnection Networks for Parallel
Computers
• Interconnection networks provide mechanisms for data transfer
between processing nodes or between processors and memory
modules.
• Interconnection networks can be classified as static (direct
networks) or dynamic (indirect networks).
Network Topologies
• A wide variety of network topologies have been used in
interconnection networks.
• These topologies try to trade off cost and scalability with
performance.
Network Topologies - Bus-Based Networks
• A bus-based network is perhaps the simplest network consisting of a
shared medium that is common to all the nodes.
• Buses are also ideal for broadcasting information among nodes.
• Typical bus based machines are limited to dozens of nodes.
• Sun Enterprise servers and Intel Pentium based shared-bus
multiprocessors are examples of such architectures.
Bus-based
interconnects with
no local caches
Bus-based
interconnects
with local
memory/caches
Network Topologies - Crossbar Networks
• A simple way to connect p processors to b memory banks is to use a
crossbar network.
• A crossbar network employs a grid of switches or switching nodes
Network Topologies - multistage interconnection
networks
• The crossbar interconnection network is scalable in terms of
performance but unscalable in terms of cost.
• The shared bus network is scalable in terms of cost but unscalable in
terms of performance.
• An intermediate class of networks called multistage interconnection
networks lies between these two extremes.
• A multistage network consisting of p processing nodes and b memory
banks.
• A commonly used multistage connection network is the omega
network.
Network Topologies - multistage interconnection
networks
• This network consists of log p stages, where p is the number of inputs
(processing nodes) and also the number of outputs (memory banks).
• A link exists between input i and output j if the following is true
represents a left-rotation
operation on the binary
representation of i to
obtain j (This
interconnection pattern is
called a perfect shuffle. )
Network Topologies - multistage interconnection
networks
• Perfect shuffle interconnection pattern for eight inputs and outputs
Network Topologies - multistage interconnection
networks
• At each stage of an omega network, a perfect shuffle interconnection
pattern feeds into a set of p/2 switches or switching nodes.
• Each switch is in one of two connection modes.
• pass-through connection
• cross-over connection
Network Topologies - multistage interconnection
networks
A complete omega network connecting eight inputs and eight outputs
• An omega network has p/2 x log p switching
nodes
8 processors 8 memory
banks
• Let s be the binary representation of a
processor that needs to write some data
into memory bank t.
• The data traverses the link to the first
switching node.
• If the most significant bits of s and t are
the same, then the data is routed in
pass-through mode by the switch.
• If bits are differentcrossover mode.
• This scheme is repeated at the next
switching stage using the next most
significant bit
Network Topologies - multistage interconnection
networks
Data routing over an omega network from processor two (010)
to memory bank seven (111) and from processor six (110) to
memory bank four (100)
• When processor two (010) is communicating with
memory bank seven (111), it blocks the path from
processor six (110) to memory bank four (100).
• In an omega network, access to a memory bank
by a processor may disallow access to another
memory bank by another processor.
• Networks with this property are referred to as
blocking networks.
Communication link AB is used
by both communication paths.
Network Topologies - Completely-Connected
Network
• In a completely-connected network, each node has a direct
communication link to every other node in the network.
• This network is ideal in the sense that a node can send a message to
another node in a single step, since a communication link exists
between them.
• Completely-connected networks are the static counterparts of
crossbar switching networks.
Network Topologies : Star-Connected
Network
• In a star-connected network, one processor acts as the central
processor.
• The star-connected network is similar to bus-based networks.
• The central processor is the bottleneck in the star topology.
• Communication between any pair of processors is routed through the
central processor, just as the shared bus forms the medium for all
communication in a bus-based network.
Network Topologies : Linear Arrays, Meshes,
and k-d Meshes
• Due to the large number of links in completely connected networks,
sparser networks are typically used to build parallel computers.
• A linear array is a static network in which each node (except the two
nodes at the ends) has two neighbors, one each to its left and right.
with no wraparound links
With wraparound link
Network Topologies : Linear Arrays, Meshes,
and k-d Meshes
• A two-dimensional mesh is an extension of the
linear array to two dimensions.
• Two dimensional meshes can be augmented with
wraparound links to form two dimensional tori.
• The three dimensional cube is a generalization of
the 2-D mesh to three dimensions.
A variety of physical
simulations commonly
executed on parallel can be
mapped naturally to 3-D
network topologies. For this
reason, 3-D cubes are used
commonly in interconnection
networks for parallel
computers (for example,
in the Cray T3E)
2-D mesh with no wraparound
2-D mesh with wraparound link (2-D torus)
a 3-D mesh with no wraparound
Network Topologies : Linear Arrays, Meshes,
and k-d Meshes
• The k-d meshes refers to the class of topologies consisting of d
dimensions with k nodes along each dimension
• A linear array forms one extreme of the k-d mesh family
• The other extreme is formed by an interesting topology called the
hypercube
• A zero-dimensional hypercube consists of 20, one node.
0-D hypercube
Network Topologies : Linear Arrays, Meshes,
and k-d Meshes
• A one-dimensional hypercube is constructed from two zero-
dimensional hypercubes by connecting them.
• A two-dimensional hypercube of four nodes is constructed from two
onedimensional hypercubes by connecting corresponding nodes.
1-D hypercube 2-D hypercube
Network Topologies : Linear Arrays, Meshes,
and k-d Meshes
• In general a d-dimensional hypercube is constructed by connecting
corresponding nodes of two (d - 1) dimensional hypercubes.
3-D hypercube
4-D hypercube
This numbering scheme has
the useful property that the
minimum distance between
two nodes is given by the
number of bits that are
different in the two labels
Network Topologies : Tree-Based Networks
• A tree network is one in which there is only one path between any
pair of nodes.
• Both linear arrays and star-connected networks are special cases of
tree networks.
• Static tree networks have a processing element at each node of the
tree.
• In a dynamic tree network, nodes at intermediate levels are
switching nodes and the leaf nodes are processing elements
Network Topologies : Tree-Based Networks
• Tree networks suffer from a communication bottleneck at higher
levels of the tree.
• This problem can be alleviated in dynamic tree networks by increasing
the number of communication links and switching nodes closer to the
root.
Cache Coherence in Multiprocessor Systems
• Interconnection networks provide basic mechanisms for
communicating messages (data), in the case of shared-address-space
computers.
• However, additional hardware is required to keep multiple copies of
data consistent with each other.
• If there exist two copies of the data (in different caches/memory
elements), how do we ensure that different processors operate on
these in a manner that follows predefined semantics?
• The problem of keeping caches in multiprocessor systems coherent is
significantly more complex than in uniprocessor systems.
Cache Coherence in Multiprocessor Systems
Incoherent caches: The caches have different values of a
single address location.
Cache Coherence in Multiprocessor Systems
Cache coherence in multiprocessor systems
Invalidate protocol
Update protocol for shared variables
Cache Coherence in Multiprocessor Systems
• Another important factor affecting the performance of these protocols is
false sharing.
• False sharing refers to the situation in which different processors update
different parts of the same cache-line.
• In an invalidate protocol, when a processor updates its part of the cache-
line, the other copies of this line are invalidated.
• The tradeoff between invalidate and update schemes is the classic tradeoff
between communication overhead (updates) and idling (stalling in
invalidates).
• Current generation cache coherent machines typically rely on invalidate
protocols.
• The rest of our discussion of multiprocessor cache systems therefore
assumes invalidate protocols.
Cache Coherence in Multiprocessor Systems
• Maintaining Coherence Using
Invalidate Protocols.
• There are three states - shared,
invalid, and dirty - that a cache
line goes through.
• The solid lines depict processor
actions and the dashed lines
coherence actions.
Maintaining coherence using a simple three-state protocol
Cache Coherence in Multiprocessor Systems
• Example of parallel program
execution with the simple
three-state coherence
protocol
• Consider an example of two
program segments being
executed by processor P0 and
P1
• The system consists of local
memories (or caches) at
processors P0 and P1, and a
global memory.
• Each data item (variable) is
assumed to be on a different
cache line.
• Initially, the two variables x
and y are tagged dirty and the
only copies of these variables
exist in the global memory
Cache Coherence in Multiprocessor Systems
The implementation of coherence protocols can be carried out using a
variety of hardware mechanisms –
• Snoopy systems
• Directory based systems
Cache Coherence in Multiprocessor Systems
Snoopy Cache Systems
• Snoopy caches are typically associated with multiprocessor systems
based on broadcast interconnection networks such as a bus or a ring.
• all processors snoop on (monitor) the bus for transactions.
Cache Coherence in Multiprocessor Systems
Snoopy Cache Systems
• Each processor's cache has a set of tag bits associated with it that
determine the state of the cache blocks
• Ex:
• when the snoop hardware detects that a read has been issued to a cache
block that it has a dirty copy of, it asserts control of the bus and puts the data
out.
• when the snoop hardware detects that a write operation has been issued on
a cache block that it has a copy of, it invalidates the block
Cache Coherence in Multiprocessor Systems
Directory Based Systems
• Consider a simple system in which the global memory is augmented
with a directory that maintains a bitmap representing cache-blocks
and the processors at which they are cached.
• These bitmap entries are sometimes referred to as the presence bits.
• Only processors that hold a particular block (or are reading it)
participate in the state transitions due to coherence operations.
Cache Coherence in
Multiprocessor Systems
• Architecture of
typical directory
based systems: a
centralized directory
Cache Coherence in
Multiprocessor Systems
• Architecture
of typical
directory
based
systems: a
distributed
directory
Summary

More Related Content

What's hot

advanced computer architesture-conditions of parallelism
advanced computer architesture-conditions of parallelismadvanced computer architesture-conditions of parallelism
advanced computer architesture-conditions of parallelism
Pankaj Kumar Jain
 
Communication model of parallel platforms
Communication model of parallel platformsCommunication model of parallel platforms
Communication model of parallel platforms
Syed Zaid Irshad
 
Introduction to Parallel Computing
Introduction to Parallel ComputingIntroduction to Parallel Computing
Introduction to Parallel Computing
Roshan Karunarathna
 
Memory consistency models
Memory consistency modelsMemory consistency models
Memory consistency models
palani kumar
 
Parallel computing
Parallel computingParallel computing
Parallel computing
Vinay Gupta
 
Chpt7
Chpt7Chpt7
Chpt7
RohitKeshari
 
Physical organization of parallel platforms
Physical organization of parallel platformsPhysical organization of parallel platforms
Physical organization of parallel platforms
Syed Zaid Irshad
 
Memory Management
Memory ManagementMemory Management
Memory Management
DEDE IRYAWAN
 
Introduction to Parallel Computing
Introduction to Parallel ComputingIntroduction to Parallel Computing
Introduction to Parallel Computing
Akhila Prabhakaran
 
Introduction to Parallel and Distributed Computing
Introduction to Parallel and Distributed ComputingIntroduction to Parallel and Distributed Computing
Introduction to Parallel and Distributed Computing
Sayed Chhattan Shah
 
Parallel computing
Parallel computingParallel computing
Parallel computingvirend111
 
Multithreading
MultithreadingMultithreading
Multithreading
A B Shinde
 
Feng’s classification
Feng’s classificationFeng’s classification
Feng’s classification
Narayan Kandel
 
Parallel computing and its applications
Parallel computing and its applicationsParallel computing and its applications
Parallel computing and its applications
Burhan Ahmed
 
Multiprocessor architecture
Multiprocessor architectureMultiprocessor architecture
Multiprocessor architecture
Arpan Baishya
 
management of distributed transactions
management of distributed transactionsmanagement of distributed transactions
management of distributed transactionsNilu Desai
 
Parallel computing chapter 3
Parallel computing chapter 3Parallel computing chapter 3
Parallel computing chapter 3Md. Mahedi Mahfuj
 
Chapter 6 pc
Chapter 6 pcChapter 6 pc
Chapter 6 pc
Hanif Durad
 
Cache coherence ppt
Cache coherence pptCache coherence ppt
Cache coherence ppt
ArendraSingh2
 
Lecture 1 (distributed systems)
Lecture 1 (distributed systems)Lecture 1 (distributed systems)
Lecture 1 (distributed systems)
Fazli Amin
 

What's hot (20)

advanced computer architesture-conditions of parallelism
advanced computer architesture-conditions of parallelismadvanced computer architesture-conditions of parallelism
advanced computer architesture-conditions of parallelism
 
Communication model of parallel platforms
Communication model of parallel platformsCommunication model of parallel platforms
Communication model of parallel platforms
 
Introduction to Parallel Computing
Introduction to Parallel ComputingIntroduction to Parallel Computing
Introduction to Parallel Computing
 
Memory consistency models
Memory consistency modelsMemory consistency models
Memory consistency models
 
Parallel computing
Parallel computingParallel computing
Parallel computing
 
Chpt7
Chpt7Chpt7
Chpt7
 
Physical organization of parallel platforms
Physical organization of parallel platformsPhysical organization of parallel platforms
Physical organization of parallel platforms
 
Memory Management
Memory ManagementMemory Management
Memory Management
 
Introduction to Parallel Computing
Introduction to Parallel ComputingIntroduction to Parallel Computing
Introduction to Parallel Computing
 
Introduction to Parallel and Distributed Computing
Introduction to Parallel and Distributed ComputingIntroduction to Parallel and Distributed Computing
Introduction to Parallel and Distributed Computing
 
Parallel computing
Parallel computingParallel computing
Parallel computing
 
Multithreading
MultithreadingMultithreading
Multithreading
 
Feng’s classification
Feng’s classificationFeng’s classification
Feng’s classification
 
Parallel computing and its applications
Parallel computing and its applicationsParallel computing and its applications
Parallel computing and its applications
 
Multiprocessor architecture
Multiprocessor architectureMultiprocessor architecture
Multiprocessor architecture
 
management of distributed transactions
management of distributed transactionsmanagement of distributed transactions
management of distributed transactions
 
Parallel computing chapter 3
Parallel computing chapter 3Parallel computing chapter 3
Parallel computing chapter 3
 
Chapter 6 pc
Chapter 6 pcChapter 6 pc
Chapter 6 pc
 
Cache coherence ppt
Cache coherence pptCache coherence ppt
Cache coherence ppt
 
Lecture 1 (distributed systems)
Lecture 1 (distributed systems)Lecture 1 (distributed systems)
Lecture 1 (distributed systems)
 

Similar to Lecture 3 parallel programming platforms

Multiprocessor
MultiprocessorMultiprocessor
Multiprocessor
Satvik Khurana
 
1-introduction-to-computer-networking.pdf
1-introduction-to-computer-networking.pdf1-introduction-to-computer-networking.pdf
1-introduction-to-computer-networking.pdf
sophia763824
 
1-introduction-to-computer-networking.pdf
1-introduction-to-computer-networking.pdf1-introduction-to-computer-networking.pdf
1-introduction-to-computer-networking.pdf
JamesOtieno34
 
Lecture 1_Introduction to Networking_1.ppt
Lecture 1_Introduction to Networking_1.pptLecture 1_Introduction to Networking_1.ppt
Lecture 1_Introduction to Networking_1.ppt
flyinimohamed
 
1-introduction-to-computer-networking-converted 2.pptx
1-introduction-to-computer-networking-converted 2.pptx1-introduction-to-computer-networking-converted 2.pptx
1-introduction-to-computer-networking-converted 2.pptx
Yashwant Srikrishnan
 
Computer network
Computer networkComputer network
Computer network
ASHIK MAHMUD
 
Networking presentation
Networking presentationNetworking presentation
Networking presentation
Jyoti Tewari
 
Networking and Internetworking Devices
Networking and Internetworking DevicesNetworking and Internetworking Devices
Networking and Internetworking Devices
21viveksingh
 
Lecture 1 Computer Networks.pptxsecond course
Lecture 1 Computer Networks.pptxsecond courseLecture 1 Computer Networks.pptxsecond course
Lecture 1 Computer Networks.pptxsecond course
LamiyQurbanlKomptert
 
computer networking and its application ppt
computer networking and its application pptcomputer networking and its application ppt
computer networking and its application ppt
Nitesh Dubey
 
Web applications and security. By: Partha Jee Chauhan, MSc Computer Science
Web applications and security. By: Partha Jee Chauhan, MSc Computer ScienceWeb applications and security. By: Partha Jee Chauhan, MSc Computer Science
Web applications and security. By: Partha Jee Chauhan, MSc Computer Science
Partha Jee Chauhan
 
Unit 1 Information Technology for student👩‍🎓.pdf
Unit 1 Information Technology for student👩‍🎓.pdfUnit 1 Information Technology for student👩‍🎓.pdf
Unit 1 Information Technology for student👩‍🎓.pdf
shuklagyanprakash578
 
BIL406-Chapter-5-Network Structures.ppt
BIL406-Chapter-5-Network Structures.pptBIL406-Chapter-5-Network Structures.ppt
BIL406-Chapter-5-Network Structures.ppt
Kadri20
 
Unit2.2
Unit2.2Unit2.2
Unit2.2
Anshumali Singh
 
Computer Network
Computer NetworkComputer Network
Computer Network
Subhash Vadadoriya
 
Module 1 Introduction to Computer Networks.pptx
Module 1 Introduction to Computer Networks.pptxModule 1 Introduction to Computer Networks.pptx
Module 1 Introduction to Computer Networks.pptx
AASTHAJAJOO
 
Connecting devices
Connecting devicesConnecting devices
Connecting devices
Sudhasini Baskaran
 
Topology,Switching and Routing
Topology,Switching and RoutingTopology,Switching and Routing
Topology,Switching and Routing
Anushiya Ram
 
Aarambhcompute rnetworking assignment
Aarambhcompute rnetworking assignmentAarambhcompute rnetworking assignment
Aarambhcompute rnetworking assignment
AARAMBH PANDEY
 
Network hardware essentials Lec#3
Network hardware essentials Lec#3Network hardware essentials Lec#3
Network hardware essentials Lec#3
Punjab and Superior College, Pakpattan
 

Similar to Lecture 3 parallel programming platforms (20)

Multiprocessor
MultiprocessorMultiprocessor
Multiprocessor
 
1-introduction-to-computer-networking.pdf
1-introduction-to-computer-networking.pdf1-introduction-to-computer-networking.pdf
1-introduction-to-computer-networking.pdf
 
1-introduction-to-computer-networking.pdf
1-introduction-to-computer-networking.pdf1-introduction-to-computer-networking.pdf
1-introduction-to-computer-networking.pdf
 
Lecture 1_Introduction to Networking_1.ppt
Lecture 1_Introduction to Networking_1.pptLecture 1_Introduction to Networking_1.ppt
Lecture 1_Introduction to Networking_1.ppt
 
1-introduction-to-computer-networking-converted 2.pptx
1-introduction-to-computer-networking-converted 2.pptx1-introduction-to-computer-networking-converted 2.pptx
1-introduction-to-computer-networking-converted 2.pptx
 
Computer network
Computer networkComputer network
Computer network
 
Networking presentation
Networking presentationNetworking presentation
Networking presentation
 
Networking and Internetworking Devices
Networking and Internetworking DevicesNetworking and Internetworking Devices
Networking and Internetworking Devices
 
Lecture 1 Computer Networks.pptxsecond course
Lecture 1 Computer Networks.pptxsecond courseLecture 1 Computer Networks.pptxsecond course
Lecture 1 Computer Networks.pptxsecond course
 
computer networking and its application ppt
computer networking and its application pptcomputer networking and its application ppt
computer networking and its application ppt
 
Web applications and security. By: Partha Jee Chauhan, MSc Computer Science
Web applications and security. By: Partha Jee Chauhan, MSc Computer ScienceWeb applications and security. By: Partha Jee Chauhan, MSc Computer Science
Web applications and security. By: Partha Jee Chauhan, MSc Computer Science
 
Unit 1 Information Technology for student👩‍🎓.pdf
Unit 1 Information Technology for student👩‍🎓.pdfUnit 1 Information Technology for student👩‍🎓.pdf
Unit 1 Information Technology for student👩‍🎓.pdf
 
BIL406-Chapter-5-Network Structures.ppt
BIL406-Chapter-5-Network Structures.pptBIL406-Chapter-5-Network Structures.ppt
BIL406-Chapter-5-Network Structures.ppt
 
Unit2.2
Unit2.2Unit2.2
Unit2.2
 
Computer Network
Computer NetworkComputer Network
Computer Network
 
Module 1 Introduction to Computer Networks.pptx
Module 1 Introduction to Computer Networks.pptxModule 1 Introduction to Computer Networks.pptx
Module 1 Introduction to Computer Networks.pptx
 
Connecting devices
Connecting devicesConnecting devices
Connecting devices
 
Topology,Switching and Routing
Topology,Switching and RoutingTopology,Switching and Routing
Topology,Switching and Routing
 
Aarambhcompute rnetworking assignment
Aarambhcompute rnetworking assignmentAarambhcompute rnetworking assignment
Aarambhcompute rnetworking assignment
 
Network hardware essentials Lec#3
Network hardware essentials Lec#3Network hardware essentials Lec#3
Network hardware essentials Lec#3
 

More from Vajira Thambawita

Lecture 2 more about parallel computing
Lecture 2   more about parallel computingLecture 2   more about parallel computing
Lecture 2 more about parallel computing
Vajira Thambawita
 
Lecture 12 localization and navigation
Lecture 12 localization and navigationLecture 12 localization and navigation
Lecture 12 localization and navigation
Vajira Thambawita
 
Lecture 11 neural network principles
Lecture 11 neural network principlesLecture 11 neural network principles
Lecture 11 neural network principles
Vajira Thambawita
 
Lecture 10 mobile robot design
Lecture 10 mobile robot designLecture 10 mobile robot design
Lecture 10 mobile robot design
Vajira Thambawita
 
Lecture 09 control
Lecture 09 controlLecture 09 control
Lecture 09 control
Vajira Thambawita
 
Lecture 08 robots and controllers
Lecture 08 robots and controllersLecture 08 robots and controllers
Lecture 08 robots and controllers
Vajira Thambawita
 
Lecture 07 more about pic
Lecture 07 more about picLecture 07 more about pic
Lecture 07 more about pic
Vajira Thambawita
 
Lecture 06 pic programming in c
Lecture 06 pic programming in cLecture 06 pic programming in c
Lecture 06 pic programming in c
Vajira Thambawita
 
Lecture 05 pic io port programming
Lecture 05 pic io port programmingLecture 05 pic io port programming
Lecture 05 pic io port programming
Vajira Thambawita
 
Lecture 04 branch call and time delay
Lecture 04  branch call and time delayLecture 04  branch call and time delay
Lecture 04 branch call and time delay
Vajira Thambawita
 
Lecture 03 basics of pic
Lecture 03 basics of picLecture 03 basics of pic
Lecture 03 basics of pic
Vajira Thambawita
 
Lecture 02 mechatronics systems
Lecture 02 mechatronics systemsLecture 02 mechatronics systems
Lecture 02 mechatronics systems
Vajira Thambawita
 
Lecture 1 - Introduction to embedded system and Robotics
Lecture 1 - Introduction to embedded system and RoboticsLecture 1 - Introduction to embedded system and Robotics
Lecture 1 - Introduction to embedded system and Robotics
Vajira Thambawita
 
Lec 09 - Registers and Counters
Lec 09 - Registers and CountersLec 09 - Registers and Counters
Lec 09 - Registers and Counters
Vajira Thambawita
 
Lec 08 - DESIGN PROCEDURE
Lec 08 - DESIGN PROCEDURELec 08 - DESIGN PROCEDURE
Lec 08 - DESIGN PROCEDURE
Vajira Thambawita
 
Lec 07 - ANALYSIS OF CLOCKED SEQUENTIAL CIRCUITS
Lec 07 - ANALYSIS OF CLOCKED SEQUENTIAL CIRCUITSLec 07 - ANALYSIS OF CLOCKED SEQUENTIAL CIRCUITS
Lec 07 - ANALYSIS OF CLOCKED SEQUENTIAL CIRCUITS
Vajira Thambawita
 
Lec 06 - Synchronous Sequential Logic
Lec 06 - Synchronous Sequential LogicLec 06 - Synchronous Sequential Logic
Lec 06 - Synchronous Sequential Logic
Vajira Thambawita
 
Lec 05 - Combinational Logic
Lec 05 - Combinational LogicLec 05 - Combinational Logic
Lec 05 - Combinational Logic
Vajira Thambawita
 
Lec 04 - Gate-level Minimization
Lec 04 - Gate-level MinimizationLec 04 - Gate-level Minimization
Lec 04 - Gate-level Minimization
Vajira Thambawita
 
Lec 03 - Combinational Logic Design
Lec 03 - Combinational Logic DesignLec 03 - Combinational Logic Design
Lec 03 - Combinational Logic Design
Vajira Thambawita
 

More from Vajira Thambawita (20)

Lecture 2 more about parallel computing
Lecture 2   more about parallel computingLecture 2   more about parallel computing
Lecture 2 more about parallel computing
 
Lecture 12 localization and navigation
Lecture 12 localization and navigationLecture 12 localization and navigation
Lecture 12 localization and navigation
 
Lecture 11 neural network principles
Lecture 11 neural network principlesLecture 11 neural network principles
Lecture 11 neural network principles
 
Lecture 10 mobile robot design
Lecture 10 mobile robot designLecture 10 mobile robot design
Lecture 10 mobile robot design
 
Lecture 09 control
Lecture 09 controlLecture 09 control
Lecture 09 control
 
Lecture 08 robots and controllers
Lecture 08 robots and controllersLecture 08 robots and controllers
Lecture 08 robots and controllers
 
Lecture 07 more about pic
Lecture 07 more about picLecture 07 more about pic
Lecture 07 more about pic
 
Lecture 06 pic programming in c
Lecture 06 pic programming in cLecture 06 pic programming in c
Lecture 06 pic programming in c
 
Lecture 05 pic io port programming
Lecture 05 pic io port programmingLecture 05 pic io port programming
Lecture 05 pic io port programming
 
Lecture 04 branch call and time delay
Lecture 04  branch call and time delayLecture 04  branch call and time delay
Lecture 04 branch call and time delay
 
Lecture 03 basics of pic
Lecture 03 basics of picLecture 03 basics of pic
Lecture 03 basics of pic
 
Lecture 02 mechatronics systems
Lecture 02 mechatronics systemsLecture 02 mechatronics systems
Lecture 02 mechatronics systems
 
Lecture 1 - Introduction to embedded system and Robotics
Lecture 1 - Introduction to embedded system and RoboticsLecture 1 - Introduction to embedded system and Robotics
Lecture 1 - Introduction to embedded system and Robotics
 
Lec 09 - Registers and Counters
Lec 09 - Registers and CountersLec 09 - Registers and Counters
Lec 09 - Registers and Counters
 
Lec 08 - DESIGN PROCEDURE
Lec 08 - DESIGN PROCEDURELec 08 - DESIGN PROCEDURE
Lec 08 - DESIGN PROCEDURE
 
Lec 07 - ANALYSIS OF CLOCKED SEQUENTIAL CIRCUITS
Lec 07 - ANALYSIS OF CLOCKED SEQUENTIAL CIRCUITSLec 07 - ANALYSIS OF CLOCKED SEQUENTIAL CIRCUITS
Lec 07 - ANALYSIS OF CLOCKED SEQUENTIAL CIRCUITS
 
Lec 06 - Synchronous Sequential Logic
Lec 06 - Synchronous Sequential LogicLec 06 - Synchronous Sequential Logic
Lec 06 - Synchronous Sequential Logic
 
Lec 05 - Combinational Logic
Lec 05 - Combinational LogicLec 05 - Combinational Logic
Lec 05 - Combinational Logic
 
Lec 04 - Gate-level Minimization
Lec 04 - Gate-level MinimizationLec 04 - Gate-level Minimization
Lec 04 - Gate-level Minimization
 
Lec 03 - Combinational Logic Design
Lec 03 - Combinational Logic DesignLec 03 - Combinational Logic Design
Lec 03 - Combinational Logic Design
 

Recently uploaded

Digital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and ResearchDigital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and Research
Vikramjit Singh
 
Fish and Chips - have they had their chips
Fish and Chips - have they had their chipsFish and Chips - have they had their chips
Fish and Chips - have they had their chips
GeoBlogs
 
Chapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptxChapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptx
Mohd Adib Abd Muin, Senior Lecturer at Universiti Utara Malaysia
 
MARUTI SUZUKI- A Successful Joint Venture in India.pptx
MARUTI SUZUKI- A Successful Joint Venture in India.pptxMARUTI SUZUKI- A Successful Joint Venture in India.pptx
MARUTI SUZUKI- A Successful Joint Venture in India.pptx
bennyroshan06
 
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
EugeneSaldivar
 
Polish students' mobility in the Czech Republic
Polish students' mobility in the Czech RepublicPolish students' mobility in the Czech Republic
Polish students' mobility in the Czech Republic
Anna Sz.
 
ESC Beyond Borders _From EU to You_ InfoPack general.pdf
ESC Beyond Borders _From EU to You_ InfoPack general.pdfESC Beyond Borders _From EU to You_ InfoPack general.pdf
ESC Beyond Borders _From EU to You_ InfoPack general.pdf
Fundacja Rozwoju Społeczeństwa Przedsiębiorczego
 
Thesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.pptThesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.ppt
EverAndrsGuerraGuerr
 
Overview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with MechanismOverview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with Mechanism
DeeptiGupta154
 
Operation Blue Star - Saka Neela Tara
Operation Blue Star   -  Saka Neela TaraOperation Blue Star   -  Saka Neela Tara
Operation Blue Star - Saka Neela Tara
Balvir Singh
 
How to Split Bills in the Odoo 17 POS Module
How to Split Bills in the Odoo 17 POS ModuleHow to Split Bills in the Odoo 17 POS Module
How to Split Bills in the Odoo 17 POS Module
Celine George
 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
siemaillard
 
The Art Pastor's Guide to Sabbath | Steve Thomason
The Art Pastor's Guide to Sabbath | Steve ThomasonThe Art Pastor's Guide to Sabbath | Steve Thomason
The Art Pastor's Guide to Sabbath | Steve Thomason
Steve Thomason
 
1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx
JosvitaDsouza2
 
Instructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptxInstructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptx
Jheel Barad
 
The approach at University of Liverpool.pptx
The approach at University of Liverpool.pptxThe approach at University of Liverpool.pptx
The approach at University of Liverpool.pptx
Jisc
 
Cambridge International AS A Level Biology Coursebook - EBook (MaryFosbery J...
Cambridge International AS  A Level Biology Coursebook - EBook (MaryFosbery J...Cambridge International AS  A Level Biology Coursebook - EBook (MaryFosbery J...
Cambridge International AS A Level Biology Coursebook - EBook (MaryFosbery J...
AzmatAli747758
 
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
Nguyen Thanh Tu Collection
 
Supporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptxSupporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptx
Jisc
 
PART A. Introduction to Costumer Service
PART A. Introduction to Costumer ServicePART A. Introduction to Costumer Service
PART A. Introduction to Costumer Service
PedroFerreira53928
 

Recently uploaded (20)

Digital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and ResearchDigital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and Research
 
Fish and Chips - have they had their chips
Fish and Chips - have they had their chipsFish and Chips - have they had their chips
Fish and Chips - have they had their chips
 
Chapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptxChapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptx
 
MARUTI SUZUKI- A Successful Joint Venture in India.pptx
MARUTI SUZUKI- A Successful Joint Venture in India.pptxMARUTI SUZUKI- A Successful Joint Venture in India.pptx
MARUTI SUZUKI- A Successful Joint Venture in India.pptx
 
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
 
Polish students' mobility in the Czech Republic
Polish students' mobility in the Czech RepublicPolish students' mobility in the Czech Republic
Polish students' mobility in the Czech Republic
 
ESC Beyond Borders _From EU to You_ InfoPack general.pdf
ESC Beyond Borders _From EU to You_ InfoPack general.pdfESC Beyond Borders _From EU to You_ InfoPack general.pdf
ESC Beyond Borders _From EU to You_ InfoPack general.pdf
 
Thesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.pptThesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.ppt
 
Overview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with MechanismOverview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with Mechanism
 
Operation Blue Star - Saka Neela Tara
Operation Blue Star   -  Saka Neela TaraOperation Blue Star   -  Saka Neela Tara
Operation Blue Star - Saka Neela Tara
 
How to Split Bills in the Odoo 17 POS Module
How to Split Bills in the Odoo 17 POS ModuleHow to Split Bills in the Odoo 17 POS Module
How to Split Bills in the Odoo 17 POS Module
 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
 
The Art Pastor's Guide to Sabbath | Steve Thomason
The Art Pastor's Guide to Sabbath | Steve ThomasonThe Art Pastor's Guide to Sabbath | Steve Thomason
The Art Pastor's Guide to Sabbath | Steve Thomason
 
1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx
 
Instructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptxInstructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptx
 
The approach at University of Liverpool.pptx
The approach at University of Liverpool.pptxThe approach at University of Liverpool.pptx
The approach at University of Liverpool.pptx
 
Cambridge International AS A Level Biology Coursebook - EBook (MaryFosbery J...
Cambridge International AS  A Level Biology Coursebook - EBook (MaryFosbery J...Cambridge International AS  A Level Biology Coursebook - EBook (MaryFosbery J...
Cambridge International AS A Level Biology Coursebook - EBook (MaryFosbery J...
 
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
 
Supporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptxSupporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptx
 
PART A. Introduction to Costumer Service
PART A. Introduction to Costumer ServicePART A. Introduction to Costumer Service
PART A. Introduction to Costumer Service
 

Lecture 3 parallel programming platforms

  • 1.
  • 3.
  • 4. Physical Organization of Parallel Platforms
  • 5. Physical Organization of Parallel Platforms • we discuss the physical architecture of parallel machines • We start with an ideal architecture • Then discuss some conventional architectures
  • 6.
  • 7. Architecture of an Ideal Parallel Computer • Parallel Random Access Machine (PRAM) • A natural extension of the serial model of computation (the Random Access Machine, or RAM) consists of p processors and a global memory of unbounded size that is uniformly accessible to all processors. • All processors access the same address space. • Processors share a common clock but may execute different instructions in each cycle.
  • 8. PRAM PRAMs can be divided into four subclasses (based on how simultaneous memory accesses are handled) • Exclusive-read, exclusive-write (EREW) PRAM • Every memory cell can be read or written to by only one processor at a time • Concurrent-read, exclusive-write (CREW) PRAM • Multiple processors can read a memory cell but only one can write at a time • Exclusive-read, concurrent-write (ERCW) PRAM • Multiple write accesses are allowed to a memory location, but multiple read accesses are serialized • Concurrent-read, concurrent-write (CRCW) PRAM • This class allows multiple read and write accesses to a common memory location. This is the most powerful PRAM model
  • 9. PRAM • Allowing concurrent read access does not create any semantic discrepancies in the program. • Concurrent write access to a memory location requires arbitration • Several protocols are used to resolve concurrent writes • Common  if all the values that the processors are attempting to write are identical • Arbitrary  an arbitrary processor is allowed to proceed with the write operation and the rest fail • Priority  processors are organized into a predefined prioritized list • Sum  the sum of all the quantities is written
  • 10.
  • 11. Interconnection Networks for Parallel Computers • Interconnection networks provide mechanisms for data transfer between processing nodes or between processors and memory modules. • Interconnection networks can be classified as static (direct networks) or dynamic (indirect networks).
  • 12.
  • 13. Network Topologies • A wide variety of network topologies have been used in interconnection networks. • These topologies try to trade off cost and scalability with performance.
  • 14. Network Topologies - Bus-Based Networks • A bus-based network is perhaps the simplest network consisting of a shared medium that is common to all the nodes. • Buses are also ideal for broadcasting information among nodes. • Typical bus based machines are limited to dozens of nodes. • Sun Enterprise servers and Intel Pentium based shared-bus multiprocessors are examples of such architectures. Bus-based interconnects with no local caches Bus-based interconnects with local memory/caches
  • 15. Network Topologies - Crossbar Networks • A simple way to connect p processors to b memory banks is to use a crossbar network. • A crossbar network employs a grid of switches or switching nodes
  • 16. Network Topologies - multistage interconnection networks • The crossbar interconnection network is scalable in terms of performance but unscalable in terms of cost. • The shared bus network is scalable in terms of cost but unscalable in terms of performance. • An intermediate class of networks called multistage interconnection networks lies between these two extremes. • A multistage network consisting of p processing nodes and b memory banks. • A commonly used multistage connection network is the omega network.
  • 17. Network Topologies - multistage interconnection networks • This network consists of log p stages, where p is the number of inputs (processing nodes) and also the number of outputs (memory banks). • A link exists between input i and output j if the following is true represents a left-rotation operation on the binary representation of i to obtain j (This interconnection pattern is called a perfect shuffle. )
  • 18. Network Topologies - multistage interconnection networks • Perfect shuffle interconnection pattern for eight inputs and outputs
  • 19. Network Topologies - multistage interconnection networks • At each stage of an omega network, a perfect shuffle interconnection pattern feeds into a set of p/2 switches or switching nodes. • Each switch is in one of two connection modes. • pass-through connection • cross-over connection
  • 20. Network Topologies - multistage interconnection networks A complete omega network connecting eight inputs and eight outputs • An omega network has p/2 x log p switching nodes 8 processors 8 memory banks • Let s be the binary representation of a processor that needs to write some data into memory bank t. • The data traverses the link to the first switching node. • If the most significant bits of s and t are the same, then the data is routed in pass-through mode by the switch. • If bits are differentcrossover mode. • This scheme is repeated at the next switching stage using the next most significant bit
  • 21. Network Topologies - multistage interconnection networks Data routing over an omega network from processor two (010) to memory bank seven (111) and from processor six (110) to memory bank four (100) • When processor two (010) is communicating with memory bank seven (111), it blocks the path from processor six (110) to memory bank four (100). • In an omega network, access to a memory bank by a processor may disallow access to another memory bank by another processor. • Networks with this property are referred to as blocking networks. Communication link AB is used by both communication paths.
  • 22. Network Topologies - Completely-Connected Network • In a completely-connected network, each node has a direct communication link to every other node in the network. • This network is ideal in the sense that a node can send a message to another node in a single step, since a communication link exists between them. • Completely-connected networks are the static counterparts of crossbar switching networks.
  • 23. Network Topologies : Star-Connected Network • In a star-connected network, one processor acts as the central processor. • The star-connected network is similar to bus-based networks. • The central processor is the bottleneck in the star topology. • Communication between any pair of processors is routed through the central processor, just as the shared bus forms the medium for all communication in a bus-based network.
  • 24. Network Topologies : Linear Arrays, Meshes, and k-d Meshes • Due to the large number of links in completely connected networks, sparser networks are typically used to build parallel computers. • A linear array is a static network in which each node (except the two nodes at the ends) has two neighbors, one each to its left and right. with no wraparound links With wraparound link
  • 25. Network Topologies : Linear Arrays, Meshes, and k-d Meshes • A two-dimensional mesh is an extension of the linear array to two dimensions. • Two dimensional meshes can be augmented with wraparound links to form two dimensional tori. • The three dimensional cube is a generalization of the 2-D mesh to three dimensions. A variety of physical simulations commonly executed on parallel can be mapped naturally to 3-D network topologies. For this reason, 3-D cubes are used commonly in interconnection networks for parallel computers (for example, in the Cray T3E) 2-D mesh with no wraparound 2-D mesh with wraparound link (2-D torus) a 3-D mesh with no wraparound
  • 26. Network Topologies : Linear Arrays, Meshes, and k-d Meshes • The k-d meshes refers to the class of topologies consisting of d dimensions with k nodes along each dimension • A linear array forms one extreme of the k-d mesh family • The other extreme is formed by an interesting topology called the hypercube • A zero-dimensional hypercube consists of 20, one node. 0-D hypercube
  • 27. Network Topologies : Linear Arrays, Meshes, and k-d Meshes • A one-dimensional hypercube is constructed from two zero- dimensional hypercubes by connecting them. • A two-dimensional hypercube of four nodes is constructed from two onedimensional hypercubes by connecting corresponding nodes. 1-D hypercube 2-D hypercube
  • 28. Network Topologies : Linear Arrays, Meshes, and k-d Meshes • In general a d-dimensional hypercube is constructed by connecting corresponding nodes of two (d - 1) dimensional hypercubes. 3-D hypercube 4-D hypercube This numbering scheme has the useful property that the minimum distance between two nodes is given by the number of bits that are different in the two labels
  • 29. Network Topologies : Tree-Based Networks • A tree network is one in which there is only one path between any pair of nodes. • Both linear arrays and star-connected networks are special cases of tree networks. • Static tree networks have a processing element at each node of the tree. • In a dynamic tree network, nodes at intermediate levels are switching nodes and the leaf nodes are processing elements
  • 30. Network Topologies : Tree-Based Networks • Tree networks suffer from a communication bottleneck at higher levels of the tree. • This problem can be alleviated in dynamic tree networks by increasing the number of communication links and switching nodes closer to the root.
  • 31.
  • 32. Cache Coherence in Multiprocessor Systems • Interconnection networks provide basic mechanisms for communicating messages (data), in the case of shared-address-space computers. • However, additional hardware is required to keep multiple copies of data consistent with each other. • If there exist two copies of the data (in different caches/memory elements), how do we ensure that different processors operate on these in a manner that follows predefined semantics? • The problem of keeping caches in multiprocessor systems coherent is significantly more complex than in uniprocessor systems.
  • 33. Cache Coherence in Multiprocessor Systems Incoherent caches: The caches have different values of a single address location.
  • 34. Cache Coherence in Multiprocessor Systems Cache coherence in multiprocessor systems Invalidate protocol Update protocol for shared variables
  • 35. Cache Coherence in Multiprocessor Systems • Another important factor affecting the performance of these protocols is false sharing. • False sharing refers to the situation in which different processors update different parts of the same cache-line. • In an invalidate protocol, when a processor updates its part of the cache- line, the other copies of this line are invalidated. • The tradeoff between invalidate and update schemes is the classic tradeoff between communication overhead (updates) and idling (stalling in invalidates). • Current generation cache coherent machines typically rely on invalidate protocols. • The rest of our discussion of multiprocessor cache systems therefore assumes invalidate protocols.
  • 36. Cache Coherence in Multiprocessor Systems • Maintaining Coherence Using Invalidate Protocols. • There are three states - shared, invalid, and dirty - that a cache line goes through. • The solid lines depict processor actions and the dashed lines coherence actions. Maintaining coherence using a simple three-state protocol
  • 37. Cache Coherence in Multiprocessor Systems • Example of parallel program execution with the simple three-state coherence protocol • Consider an example of two program segments being executed by processor P0 and P1 • The system consists of local memories (or caches) at processors P0 and P1, and a global memory. • Each data item (variable) is assumed to be on a different cache line. • Initially, the two variables x and y are tagged dirty and the only copies of these variables exist in the global memory
  • 38. Cache Coherence in Multiprocessor Systems The implementation of coherence protocols can be carried out using a variety of hardware mechanisms – • Snoopy systems • Directory based systems
  • 39. Cache Coherence in Multiprocessor Systems Snoopy Cache Systems • Snoopy caches are typically associated with multiprocessor systems based on broadcast interconnection networks such as a bus or a ring. • all processors snoop on (monitor) the bus for transactions.
  • 40. Cache Coherence in Multiprocessor Systems Snoopy Cache Systems • Each processor's cache has a set of tag bits associated with it that determine the state of the cache blocks • Ex: • when the snoop hardware detects that a read has been issued to a cache block that it has a dirty copy of, it asserts control of the bus and puts the data out. • when the snoop hardware detects that a write operation has been issued on a cache block that it has a copy of, it invalidates the block
  • 41. Cache Coherence in Multiprocessor Systems Directory Based Systems • Consider a simple system in which the global memory is augmented with a directory that maintains a bitmap representing cache-blocks and the processors at which they are cached. • These bitmap entries are sometimes referred to as the presence bits. • Only processors that hold a particular block (or are reading it) participate in the state transitions due to coherence operations.
  • 42. Cache Coherence in Multiprocessor Systems • Architecture of typical directory based systems: a centralized directory
  • 43. Cache Coherence in Multiprocessor Systems • Architecture of typical directory based systems: a distributed directory