SlideShare a Scribd company logo
1 of 29
Unit III
Multiprocessors and Thread-Level Parallelism
By
N.R.Rejin Paul
Lecturer/VIT/CSE
CS2354 Advanced Computer Architecture
2
Chapter 6. Multiprocessors and
Thread-Level Parallelism
6.1 Introduction
6.2 Characteristics of Application Domains
6.3 Symmetric Shared-Memory Architectures
6.4 Performance of Symmetric Shared-Memory
Multiprocessors
6.5 Distributed Shared-Memory Architectures
6.6 Performance of Distributed Shared-Memory
Multiprocessors
6.7 Synchronization
6.8 Models of Memory Consistency: An Introduction
6.9 Multithreading: Exploiting Thread-Level Parallelism
within a Processor
3
Taxonomy of Parallel Architectures
Flynn Categories
• SISD (Single Instruction Single Data)
– Uniprocessors
• MISD (Multiple Instruction Single Data)
– ???; multiple processors on a single data stream
• SIMD (Single Instruction Multiple Data)
– same instruction executed by multiple processors using different data streams
• Each processor has its data memory (hence multiple data)
• There’s a single instruction memory and control processor
– Simple programming model, Low overhead, Flexibility
– (Phrase reused by Intel marketing for media instructions ~ vector)
– Examples: vector architectures, Illiac-IV, CM-2
• MIMD (Multiple Instruction Multiple Data)
– Each processor fetches its own instructions and operates on its own data
– MIMD current winner: Concentrate on major design emphasis <= 128 processors
• Use off-the-shelf microprocessors: cost-performance advantages
• Flexible: high performance for one application, running many tasks simultaneously
– Examples: Sun Enterprise 5000, Cray T3D, SGI Origin
4
MIMD Class 1:
Centralized shared-memory multiprocessor
share a single centralized memory, interconnect processors and memory by a bus
• also known as “uniform memory access” time taken to access from all processor
to memory is same (UMA) or
“symmetric (shared-memory) multiprocessor” (SMP)
– A symmetric relationship to all processors
– A uniform memory access time from any processor
• scalability problem: less attractive for large-scale processors
5
MIMD Class 2:
Distributed-memory multiprocessor
memory modules associated with CPUs
• Advantages:
– cost-effective way to scale memory bandwidth
– lower memory latency for local memory access
• Drawbacks
– longer communication latency for communicating data between processors
– software model more complex
6
6.3 Symmetric Shared-Memory Architectures
Each processor have same relationship to single memory
usually supports caching both private data and shared data
Caching in shared-memory machines
• private data: data used by a single processor
– When a private item is cached, its location is migrated to the cache
– Since no other processor uses the data, the program behavior is identical to that
in a uniprocessor
• shared data: data used by multiple processor
– When shared data are cached, the shared value may be replicated in multiple
caches
– advantages: reduce access latency and fulfill bandwidth requirements, due to
difference in communication for load store and strategy to write from caches
values form diff. caches may not be consistent
– induce a new problem: cache coherence
Coherence cache provides:
• migration: a data item can be moved to a local cache and used there in a
transparent fashion
• replication for shared data that are being simultaneously read
both are critical to performance in accessing shared data
7
Multiprocessor Cache Coherence Problem
• Informally:
– “memory system is coherent if Any read must return the most recent write”
– Coherent – defines what value can be returned by a read
– Consistency – that determines when a return value will be returned by a read
– Too strict and too difficult to implement
• Better:
– Write propagation : value return must visible to other caches “Any write must
eventually be seen by a read”
– All writes are seen in proper order by all caches(“serialization”)
• Two rules to ensure this:
– “If P writes x and then P1 reads it, P’s write will be seen by P1 if the read and
write are sufficiently far apart”
– Writes to a single location are serialized: seen in one order
• Latest write will be seen
• Otherwise could see writes in illogical order
(could see older value after a newer value)
8
Example Cache Coherence Problem
– Processors see different values for u after event 3
I/O devices
Memory
P1
$ $ $
P2 P3
5
u = ?
4
u = ?
u:5
1
u :5
2
u :5
3
u= 7
9
Defining Coherent Memory System
1. Preserve Program Order: A read by processor P to location X
that follows a write by P to X, with no writes of X by another
processor occurring between the write and the read by P, always
returns the value written by P
2. Coherent view of memory: Read by a processor to location X that
follows a write by another processor to X returns the written value
if the read and write are sufficiently separated in time and no
other writes to X occur between the two accesses
3. Write serialization: 2 writes to same location by any 2 processors
are seen in the same order by all processors
– For example, if the values 1 and then 2 are written to a
location X by P1 and P2, processors can never read the value
of the location X as 2 and then later read it as 1
10
Basic Schemes for Enforcing Coherence
• Program on multiple processors will normally have copies of the
same data in several caches
• Rather than trying to avoid sharing in SW,
SMPs use a HW protocol to maintain coherent caches
–Migration and Replication key to performance of shared data
• Migration - data can be moved to a local cache and used there in
a transparent fashion
–Reduces both latency to access shared data that is allocated
remotely and bandwidth demand on the shared memory
• Replication – for shared data being simultaneously read, since
caches make a copy of data in local cache
–Reduces both latency of access and contention for reading
shared data
11
2 Classes of Cache Coherence Protocols
1. Snooping — Every cache with a copy of data also has a copy of
sharing status of block, but no centralized state is kept
• All caches are accessible via some broadcast medium (a bus or switch)
• All cache controllers monitor or snoop on the medium to determine
whether or not they have a copy of a block that is requested on a bus or
switch access
2. Directory based — Sharing status of a block of physical memory
is kept in just one location, the directory
12
Snoopy Cache-Coherence Protocols
• Cache Controller “snoops” all transactions on the shared
medium (bus or switch)
– relevant transaction if for a block it contains
– take action to ensure coherence
• invalidate, update, or supply value
– depends on state of the block and the protocol
• Either get exclusive access before write via write
invalidate or update all copies on write
State
Address (tag)
Data
I/O devices
Mem
P1
$
Bus snoop
$
P
n
Cache-memory
transaction
13
Example: Write-thru Invalidate
• Must invalidate before step 3
• Write update uses more broadcast medium BW
 all recent MPUs use write invalidate
I/O devices
Memory
P1
$ $ $
P2 P3
5
u = ?
4
u = ?
u:5
1
u :5
2
u :5
3
u= 7
u = 7
14
Two Classes of Cache Coherence Protocols
•Snooping Solution (Snoopy Bus)
– Send all requests for data to all processors
– Processors snoop to see if they have a copy and respond accordingly
– Requires broadcast, since caching information is at processors
– Works well with bus (natural broadcast medium)
– Dominates for small scale machines (most of the market)
•Directory-Based Schemes (Section 6.5)
– Directory keeps track of what is being shared in a centralized place
– Distributed memory => distributed directory for scalability
(avoids bottlenecks)
– Send point-to-point requests to processors via network
– Scales better than Snooping
– Actually existed BEFORE Snooping-based schemes
15
Basic Snoopy Protocols
• Write strategies
– Write-through: memory is always up-to-date
–Write-back: snoop in caches to find most recent copy
There are two ways to maintain coherence requirements using snooping protocols
• Write Invalidate Protocol
– Multiple readers, single writer
– Write to shared data: an invalidate is sent to all caches which snoop and
invalidate any copies
• Read miss: further read will miss in the cache and fetch a new copy of the data
• Write Broadcast/Update Protocol
– Write to shared data: broadcast on bus, processors snoop, and update any
copies
– Read miss: memory/cache is always up-to-date
• Write serialization: bus serializes requests!
–Bus is single point of arbitration
16
Examples of Basic Snooping Protocols
Assume neither cache initially holds X and the value of X in memory is 0
Write Invalidate
Write Update
17
An Example Snoopy Protocol
Invalidation protocol, write-back cache
• Each cache block is in one state (track these):
– Shared : block can be read
– OR Exclusive : cache has only copy, its writeable, and dirty
– OR Invalid : block contains no data
– an extra state bit (shared/exclusive) associated with a valid bit and a
dirty bit for each block
• Each block of memory is in one state:
– Clean in all caches and up-to-date in memory (Shared)
– OR Dirty in exactly one cache (Exclusive)
– OR Not in any caches
• Each processor snoops every address placed on the bus
– If a processor finds that is has a dirty copy of the requested cache block,
it provides that cache block in response to the read request
18
Cache Coherence Mechanism of the Example
Placing a write miss on the bus when a write hits in the shared state ensures an
exclusive copy (data not transferred)
19
Figure 6.11 State Transitions for Each Cache Block
•CPU may read/write hit/miss to the block
•May place write/read miss on bus
•May receive read/write miss from bus
Requests from CPU Requests from bus
20
Cache Coherence
State Diagram
Figure 6.10 and Figure 6.12 (CPU in
black and bus in gray from Figure 6.11)
21
6.5 Distributed Shared-Memory Architectures
Distributed shared-memory architectures
• Separate memory per processor
– Local or remote access via memory controller
– The physical address space is statically distributed
Coherence Problems
• Simple approach: uncacheable
– shared data are marked as uncacheable and only private data are kept in caches
– very long latency to access memory for shared data
• Alternative: directory for memory blocks
– The directory per memory tracks state of every block in every cache
• which caches have a copies of the memory block, dirty vs. clean, ...
– Two additional complications
• The interconnect cannot be used as a single point of arbitration like the bus
• Because the interconnect is message oriented, many messages must have
explicit responses
22
Distributed Directory Multiprocessor
To prevent directory becoming the bottleneck, we distribute directory entries with
memory, each keeping track of which processors have copies of their memory blocks
23
Directory Protocols
• Similar to Snoopy Protocol: Three states
– Shared: 1 or more processors have the block cached, and the value in memory is
up-to-date (as well as in all the caches)
– Uncached: no processor has a copy of the cache block (not valid in any cache)
– Exclusive: Exactly one processor has a copy of the cache block, and it has
written the block, so the memory copy is out of date
• The processor is called the owner of the block
• In addition to tracking the state of each cache block, we must track
the processors that have copies of the block when it is shared
(usually a bit vector for each memory block: 1 if processor has copy)
• Keep it simple(r):
– Writes to non-exclusive data
=> write miss
– Processor blocks until access completes
– Assume messages received and acted upon in order sent
24
Messages for Directory Protocols
•local node: the node where a request originates
•home node: the node where the memory location and directory entry of an address reside
•remote node: the node that has a copy of a cache block (exclusive or shared)
25
State Transition Diagram
for Individual Cache Block
• Comparing to snooping protocols:
– identical states
– stimulus is almost identical
– write a shared cache block is
treated as a write miss (without
fetch the block)
– cache block must be in exclusive
state when it is written
– any shared block must be up to
date in memory
• write miss: data fetch and selective
invalidate operations sent by the
directory controller (broadcast in
snooping protocols)
26
State Transition Diagram for
the Directory
Figure 6.29
Transition
diagram for
cache block
Three requests: read miss,
write miss and data write back
27
Directory Operations: Requests and Actions
• Message sent to directory causes two actions:
– Update the directory
– More messages to satisfy request
• Block is in Uncached state: the copy in memory is the current value;
only possible requests for that block are:
– Read miss: requesting processor sent data from memory &requestor made only
sharing node; state of block made Shared.
– Write miss: requesting processor is sent the value & becomes the Sharing node.
The block is made Exclusive to indicate that the only valid copy is cached.
Sharers indicates the identity of the owner.
• Block is Shared => the memory value is up-to-date:
– Read miss: requesting processor is sent back the data from memory &
requesting processor is added to the sharing set.
– Write miss: requesting processor is sent the value. All processors in the set
Sharers are sent invalidate messages, & Sharers is set to identity of requesting
processor. The state of the block is made Exclusive.
28
Directory Operations: Requests and Actions(cont.)
• Block is Exclusive: current value of the block is held in the cache of
the processor identified by the set Sharers (the owner) => three
possible directory requests:
– Read miss: owner processor sent data fetch message, causing state of block in
owner’s cache to transition to Shared and causes owner to send data to
directory, where it is written to memory & sent back to requesting processor.
Identity of requesting processor is added to set Sharers, which still contains the
identity of the processor that was the owner (since it still has a readable copy).
State is shared.
– Data write-back: owner processor is replacing the block and hence must write it
back, making memory copy up-to-date
(the home directory essentially becomes the owner), the block is now
Uncached, and the Sharer set is empty.
– Write miss: block has a new owner. A message is sent to old owner causing the
cache to send the value of the block to the directory from which it is sent to the
requesting processor, which becomes the new owner. Sharers is set to identity
of new owner, and state of block is made Exclusive.
29
Summary
Chapter 6. Multiprocessors and Thread-Level Parallelism
6.1 Introduction
6.2 Characteristics of Application Domains
6.3 Symmetric Shared-Memory Architectures
6.4 Performance of Symmetric Shared-Memory
Multiprocessors
6.5 Distributed Shared-Memory Architectures
6.6 Performance of Distributed Shared-Memory
Multiprocessors
6.7 Synchronization
6.8 Models of Memory Consistency: An Introduction
6.9 Multithreading: Exploiting Thread-Level Parallelism
within a Processor

More Related Content

Similar to 247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt

Memory and Cache Coherence in Multiprocessor System.pdf
Memory and Cache Coherence in Multiprocessor System.pdfMemory and Cache Coherence in Multiprocessor System.pdf
Memory and Cache Coherence in Multiprocessor System.pdfrajaratna4
 
Multiprocessor.pptx
 Multiprocessor.pptx Multiprocessor.pptx
Multiprocessor.pptxMuhammad54342
 
Cache coherence problem and its solutions
Cache coherence problem and its solutionsCache coherence problem and its solutions
Cache coherence problem and its solutionsMajid Saleem
 
18 parallel processing
18 parallel processing18 parallel processing
18 parallel processingdilip kumar
 
Introduction to Thread Level Parallelism
Introduction to Thread Level ParallelismIntroduction to Thread Level Parallelism
Introduction to Thread Level ParallelismDilum Bandara
 
Solving Hadoop Replication Challenges with an Active-Active Paxos Algorithm
Solving Hadoop Replication Challenges with an Active-Active Paxos AlgorithmSolving Hadoop Replication Challenges with an Active-Active Paxos Algorithm
Solving Hadoop Replication Challenges with an Active-Active Paxos AlgorithmDataWorks Summit
 
Module2 MultiThreads.ppt
Module2 MultiThreads.pptModule2 MultiThreads.ppt
Module2 MultiThreads.pptshreesha16
 
The Google File System (GFS)
The Google File System (GFS)The Google File System (GFS)
The Google File System (GFS)Romain Jacotin
 
Cache coherence
Cache coherenceCache coherence
Cache coherenceEmployee
 
Gfs google-file-system-13331
Gfs google-file-system-13331Gfs google-file-system-13331
Gfs google-file-system-13331Fengchang Xie
 

Similar to 247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt (20)

Memory and Cache Coherence in Multiprocessor System.pdf
Memory and Cache Coherence in Multiprocessor System.pdfMemory and Cache Coherence in Multiprocessor System.pdf
Memory and Cache Coherence in Multiprocessor System.pdf
 
Cache coherence
Cache coherenceCache coherence
Cache coherence
 
module4.ppt
module4.pptmodule4.ppt
module4.ppt
 
Multiprocessor.pptx
 Multiprocessor.pptx Multiprocessor.pptx
Multiprocessor.pptx
 
Cache Coherence.pptx
Cache Coherence.pptxCache Coherence.pptx
Cache Coherence.pptx
 
Cache coherence problem and its solutions
Cache coherence problem and its solutionsCache coherence problem and its solutions
Cache coherence problem and its solutions
 
parallel-processing.ppt
parallel-processing.pptparallel-processing.ppt
parallel-processing.ppt
 
18 parallel processing
18 parallel processing18 parallel processing
18 parallel processing
 
Introduction to Thread Level Parallelism
Introduction to Thread Level ParallelismIntroduction to Thread Level Parallelism
Introduction to Thread Level Parallelism
 
Ch8 main memory
Ch8   main memoryCh8   main memory
Ch8 main memory
 
Solving Hadoop Replication Challenges with an Active-Active Paxos Algorithm
Solving Hadoop Replication Challenges with an Active-Active Paxos AlgorithmSolving Hadoop Replication Challenges with an Active-Active Paxos Algorithm
Solving Hadoop Replication Challenges with an Active-Active Paxos Algorithm
 
Module2 MultiThreads.ppt
Module2 MultiThreads.pptModule2 MultiThreads.ppt
Module2 MultiThreads.ppt
 
Lecture2
Lecture2Lecture2
Lecture2
 
Distributed system
Distributed systemDistributed system
Distributed system
 
Cache coherence
Cache coherenceCache coherence
Cache coherence
 
The Google File System (GFS)
The Google File System (GFS)The Google File System (GFS)
The Google File System (GFS)
 
Cache memory
Cache memoryCache memory
Cache memory
 
Cache coherence
Cache coherenceCache coherence
Cache coherence
 
Cs8493 unit 3
Cs8493 unit 3Cs8493 unit 3
Cs8493 unit 3
 
Gfs google-file-system-13331
Gfs google-file-system-13331Gfs google-file-system-13331
Gfs google-file-system-13331
 

Recently uploaded

handbook on reinforce concrete and detailing
handbook on reinforce concrete and detailinghandbook on reinforce concrete and detailing
handbook on reinforce concrete and detailingAshishSingh1301
 
Autodesk Construction Cloud (Autodesk Build).pptx
Autodesk Construction Cloud (Autodesk Build).pptxAutodesk Construction Cloud (Autodesk Build).pptx
Autodesk Construction Cloud (Autodesk Build).pptxMustafa Ahmed
 
The Entity-Relationship Model(ER Diagram).pptx
The Entity-Relationship Model(ER Diagram).pptxThe Entity-Relationship Model(ER Diagram).pptx
The Entity-Relationship Model(ER Diagram).pptxMANASINANDKISHORDEOR
 
Involute of a circle,Square, pentagon,HexagonInvolute_Engineering Drawing.pdf
Involute of a circle,Square, pentagon,HexagonInvolute_Engineering Drawing.pdfInvolute of a circle,Square, pentagon,HexagonInvolute_Engineering Drawing.pdf
Involute of a circle,Square, pentagon,HexagonInvolute_Engineering Drawing.pdfJNTUA
 
CLOUD COMPUTING SERVICES - Cloud Reference Modal
CLOUD COMPUTING SERVICES - Cloud Reference ModalCLOUD COMPUTING SERVICES - Cloud Reference Modal
CLOUD COMPUTING SERVICES - Cloud Reference ModalSwarnaSLcse
 
litvinenko_Henry_Intrusion_Hong-Kong_2024.pdf
litvinenko_Henry_Intrusion_Hong-Kong_2024.pdflitvinenko_Henry_Intrusion_Hong-Kong_2024.pdf
litvinenko_Henry_Intrusion_Hong-Kong_2024.pdfAlexander Litvinenko
 
Insurance management system project report.pdf
Insurance management system project report.pdfInsurance management system project report.pdf
Insurance management system project report.pdfKamal Acharya
 
Final DBMS Manual (2).pdf final lab manual
Final DBMS Manual (2).pdf final lab manualFinal DBMS Manual (2).pdf final lab manual
Final DBMS Manual (2).pdf final lab manualBalamuruganV28
 
engineering chemistry power point presentation
engineering chemistry  power point presentationengineering chemistry  power point presentation
engineering chemistry power point presentationsj9399037128
 
Maher Othman Interior Design Portfolio..
Maher Othman Interior Design Portfolio..Maher Othman Interior Design Portfolio..
Maher Othman Interior Design Portfolio..MaherOthman7
 
Filters for Electromagnetic Compatibility Applications
Filters for Electromagnetic Compatibility ApplicationsFilters for Electromagnetic Compatibility Applications
Filters for Electromagnetic Compatibility ApplicationsMathias Magdowski
 
Software Engineering Practical File Front Pages.pdf
Software Engineering Practical File Front Pages.pdfSoftware Engineering Practical File Front Pages.pdf
Software Engineering Practical File Front Pages.pdfssuser5c9d4b1
 
Developing a smart system for infant incubators using the internet of things ...
Developing a smart system for infant incubators using the internet of things ...Developing a smart system for infant incubators using the internet of things ...
Developing a smart system for infant incubators using the internet of things ...IJECEIAES
 
Fuzzy logic method-based stress detector with blood pressure and body tempera...
Fuzzy logic method-based stress detector with blood pressure and body tempera...Fuzzy logic method-based stress detector with blood pressure and body tempera...
Fuzzy logic method-based stress detector with blood pressure and body tempera...IJECEIAES
 
electrical installation and maintenance.
electrical installation and maintenance.electrical installation and maintenance.
electrical installation and maintenance.benjamincojr
 
History of Indian Railways - the story of Growth & Modernization
History of Indian Railways - the story of Growth & ModernizationHistory of Indian Railways - the story of Growth & Modernization
History of Indian Railways - the story of Growth & ModernizationEmaan Sharma
 
21P35A0312 Internship eccccccReport.docx
21P35A0312 Internship eccccccReport.docx21P35A0312 Internship eccccccReport.docx
21P35A0312 Internship eccccccReport.docxrahulmanepalli02
 
Diploma Engineering Drawing Qp-2024 Ece .pdf
Diploma Engineering Drawing Qp-2024 Ece .pdfDiploma Engineering Drawing Qp-2024 Ece .pdf
Diploma Engineering Drawing Qp-2024 Ece .pdfJNTUA
 
Maximizing Incident Investigation Efficacy in Oil & Gas: Techniques and Tools
Maximizing Incident Investigation Efficacy in Oil & Gas: Techniques and ToolsMaximizing Incident Investigation Efficacy in Oil & Gas: Techniques and Tools
Maximizing Incident Investigation Efficacy in Oil & Gas: Techniques and Toolssoginsider
 
15-Minute City: A Completely New Horizon
15-Minute City: A Completely New Horizon15-Minute City: A Completely New Horizon
15-Minute City: A Completely New HorizonMorshed Ahmed Rahath
 

Recently uploaded (20)

handbook on reinforce concrete and detailing
handbook on reinforce concrete and detailinghandbook on reinforce concrete and detailing
handbook on reinforce concrete and detailing
 
Autodesk Construction Cloud (Autodesk Build).pptx
Autodesk Construction Cloud (Autodesk Build).pptxAutodesk Construction Cloud (Autodesk Build).pptx
Autodesk Construction Cloud (Autodesk Build).pptx
 
The Entity-Relationship Model(ER Diagram).pptx
The Entity-Relationship Model(ER Diagram).pptxThe Entity-Relationship Model(ER Diagram).pptx
The Entity-Relationship Model(ER Diagram).pptx
 
Involute of a circle,Square, pentagon,HexagonInvolute_Engineering Drawing.pdf
Involute of a circle,Square, pentagon,HexagonInvolute_Engineering Drawing.pdfInvolute of a circle,Square, pentagon,HexagonInvolute_Engineering Drawing.pdf
Involute of a circle,Square, pentagon,HexagonInvolute_Engineering Drawing.pdf
 
CLOUD COMPUTING SERVICES - Cloud Reference Modal
CLOUD COMPUTING SERVICES - Cloud Reference ModalCLOUD COMPUTING SERVICES - Cloud Reference Modal
CLOUD COMPUTING SERVICES - Cloud Reference Modal
 
litvinenko_Henry_Intrusion_Hong-Kong_2024.pdf
litvinenko_Henry_Intrusion_Hong-Kong_2024.pdflitvinenko_Henry_Intrusion_Hong-Kong_2024.pdf
litvinenko_Henry_Intrusion_Hong-Kong_2024.pdf
 
Insurance management system project report.pdf
Insurance management system project report.pdfInsurance management system project report.pdf
Insurance management system project report.pdf
 
Final DBMS Manual (2).pdf final lab manual
Final DBMS Manual (2).pdf final lab manualFinal DBMS Manual (2).pdf final lab manual
Final DBMS Manual (2).pdf final lab manual
 
engineering chemistry power point presentation
engineering chemistry  power point presentationengineering chemistry  power point presentation
engineering chemistry power point presentation
 
Maher Othman Interior Design Portfolio..
Maher Othman Interior Design Portfolio..Maher Othman Interior Design Portfolio..
Maher Othman Interior Design Portfolio..
 
Filters for Electromagnetic Compatibility Applications
Filters for Electromagnetic Compatibility ApplicationsFilters for Electromagnetic Compatibility Applications
Filters for Electromagnetic Compatibility Applications
 
Software Engineering Practical File Front Pages.pdf
Software Engineering Practical File Front Pages.pdfSoftware Engineering Practical File Front Pages.pdf
Software Engineering Practical File Front Pages.pdf
 
Developing a smart system for infant incubators using the internet of things ...
Developing a smart system for infant incubators using the internet of things ...Developing a smart system for infant incubators using the internet of things ...
Developing a smart system for infant incubators using the internet of things ...
 
Fuzzy logic method-based stress detector with blood pressure and body tempera...
Fuzzy logic method-based stress detector with blood pressure and body tempera...Fuzzy logic method-based stress detector with blood pressure and body tempera...
Fuzzy logic method-based stress detector with blood pressure and body tempera...
 
electrical installation and maintenance.
electrical installation and maintenance.electrical installation and maintenance.
electrical installation and maintenance.
 
History of Indian Railways - the story of Growth & Modernization
History of Indian Railways - the story of Growth & ModernizationHistory of Indian Railways - the story of Growth & Modernization
History of Indian Railways - the story of Growth & Modernization
 
21P35A0312 Internship eccccccReport.docx
21P35A0312 Internship eccccccReport.docx21P35A0312 Internship eccccccReport.docx
21P35A0312 Internship eccccccReport.docx
 
Diploma Engineering Drawing Qp-2024 Ece .pdf
Diploma Engineering Drawing Qp-2024 Ece .pdfDiploma Engineering Drawing Qp-2024 Ece .pdf
Diploma Engineering Drawing Qp-2024 Ece .pdf
 
Maximizing Incident Investigation Efficacy in Oil & Gas: Techniques and Tools
Maximizing Incident Investigation Efficacy in Oil & Gas: Techniques and ToolsMaximizing Incident Investigation Efficacy in Oil & Gas: Techniques and Tools
Maximizing Incident Investigation Efficacy in Oil & Gas: Techniques and Tools
 
15-Minute City: A Completely New Horizon
15-Minute City: A Completely New Horizon15-Minute City: A Completely New Horizon
15-Minute City: A Completely New Horizon
 

247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt

  • 1. Unit III Multiprocessors and Thread-Level Parallelism By N.R.Rejin Paul Lecturer/VIT/CSE CS2354 Advanced Computer Architecture
  • 2. 2 Chapter 6. Multiprocessors and Thread-Level Parallelism 6.1 Introduction 6.2 Characteristics of Application Domains 6.3 Symmetric Shared-Memory Architectures 6.4 Performance of Symmetric Shared-Memory Multiprocessors 6.5 Distributed Shared-Memory Architectures 6.6 Performance of Distributed Shared-Memory Multiprocessors 6.7 Synchronization 6.8 Models of Memory Consistency: An Introduction 6.9 Multithreading: Exploiting Thread-Level Parallelism within a Processor
  • 3. 3 Taxonomy of Parallel Architectures Flynn Categories • SISD (Single Instruction Single Data) – Uniprocessors • MISD (Multiple Instruction Single Data) – ???; multiple processors on a single data stream • SIMD (Single Instruction Multiple Data) – same instruction executed by multiple processors using different data streams • Each processor has its data memory (hence multiple data) • There’s a single instruction memory and control processor – Simple programming model, Low overhead, Flexibility – (Phrase reused by Intel marketing for media instructions ~ vector) – Examples: vector architectures, Illiac-IV, CM-2 • MIMD (Multiple Instruction Multiple Data) – Each processor fetches its own instructions and operates on its own data – MIMD current winner: Concentrate on major design emphasis <= 128 processors • Use off-the-shelf microprocessors: cost-performance advantages • Flexible: high performance for one application, running many tasks simultaneously – Examples: Sun Enterprise 5000, Cray T3D, SGI Origin
  • 4. 4 MIMD Class 1: Centralized shared-memory multiprocessor share a single centralized memory, interconnect processors and memory by a bus • also known as “uniform memory access” time taken to access from all processor to memory is same (UMA) or “symmetric (shared-memory) multiprocessor” (SMP) – A symmetric relationship to all processors – A uniform memory access time from any processor • scalability problem: less attractive for large-scale processors
  • 5. 5 MIMD Class 2: Distributed-memory multiprocessor memory modules associated with CPUs • Advantages: – cost-effective way to scale memory bandwidth – lower memory latency for local memory access • Drawbacks – longer communication latency for communicating data between processors – software model more complex
  • 6. 6 6.3 Symmetric Shared-Memory Architectures Each processor have same relationship to single memory usually supports caching both private data and shared data Caching in shared-memory machines • private data: data used by a single processor – When a private item is cached, its location is migrated to the cache – Since no other processor uses the data, the program behavior is identical to that in a uniprocessor • shared data: data used by multiple processor – When shared data are cached, the shared value may be replicated in multiple caches – advantages: reduce access latency and fulfill bandwidth requirements, due to difference in communication for load store and strategy to write from caches values form diff. caches may not be consistent – induce a new problem: cache coherence Coherence cache provides: • migration: a data item can be moved to a local cache and used there in a transparent fashion • replication for shared data that are being simultaneously read both are critical to performance in accessing shared data
  • 7. 7 Multiprocessor Cache Coherence Problem • Informally: – “memory system is coherent if Any read must return the most recent write” – Coherent – defines what value can be returned by a read – Consistency – that determines when a return value will be returned by a read – Too strict and too difficult to implement • Better: – Write propagation : value return must visible to other caches “Any write must eventually be seen by a read” – All writes are seen in proper order by all caches(“serialization”) • Two rules to ensure this: – “If P writes x and then P1 reads it, P’s write will be seen by P1 if the read and write are sufficiently far apart” – Writes to a single location are serialized: seen in one order • Latest write will be seen • Otherwise could see writes in illogical order (could see older value after a newer value)
  • 8. 8 Example Cache Coherence Problem – Processors see different values for u after event 3 I/O devices Memory P1 $ $ $ P2 P3 5 u = ? 4 u = ? u:5 1 u :5 2 u :5 3 u= 7
  • 9. 9 Defining Coherent Memory System 1. Preserve Program Order: A read by processor P to location X that follows a write by P to X, with no writes of X by another processor occurring between the write and the read by P, always returns the value written by P 2. Coherent view of memory: Read by a processor to location X that follows a write by another processor to X returns the written value if the read and write are sufficiently separated in time and no other writes to X occur between the two accesses 3. Write serialization: 2 writes to same location by any 2 processors are seen in the same order by all processors – For example, if the values 1 and then 2 are written to a location X by P1 and P2, processors can never read the value of the location X as 2 and then later read it as 1
  • 10. 10 Basic Schemes for Enforcing Coherence • Program on multiple processors will normally have copies of the same data in several caches • Rather than trying to avoid sharing in SW, SMPs use a HW protocol to maintain coherent caches –Migration and Replication key to performance of shared data • Migration - data can be moved to a local cache and used there in a transparent fashion –Reduces both latency to access shared data that is allocated remotely and bandwidth demand on the shared memory • Replication – for shared data being simultaneously read, since caches make a copy of data in local cache –Reduces both latency of access and contention for reading shared data
  • 11. 11 2 Classes of Cache Coherence Protocols 1. Snooping — Every cache with a copy of data also has a copy of sharing status of block, but no centralized state is kept • All caches are accessible via some broadcast medium (a bus or switch) • All cache controllers monitor or snoop on the medium to determine whether or not they have a copy of a block that is requested on a bus or switch access 2. Directory based — Sharing status of a block of physical memory is kept in just one location, the directory
  • 12. 12 Snoopy Cache-Coherence Protocols • Cache Controller “snoops” all transactions on the shared medium (bus or switch) – relevant transaction if for a block it contains – take action to ensure coherence • invalidate, update, or supply value – depends on state of the block and the protocol • Either get exclusive access before write via write invalidate or update all copies on write State Address (tag) Data I/O devices Mem P1 $ Bus snoop $ P n Cache-memory transaction
  • 13. 13 Example: Write-thru Invalidate • Must invalidate before step 3 • Write update uses more broadcast medium BW  all recent MPUs use write invalidate I/O devices Memory P1 $ $ $ P2 P3 5 u = ? 4 u = ? u:5 1 u :5 2 u :5 3 u= 7 u = 7
  • 14. 14 Two Classes of Cache Coherence Protocols •Snooping Solution (Snoopy Bus) – Send all requests for data to all processors – Processors snoop to see if they have a copy and respond accordingly – Requires broadcast, since caching information is at processors – Works well with bus (natural broadcast medium) – Dominates for small scale machines (most of the market) •Directory-Based Schemes (Section 6.5) – Directory keeps track of what is being shared in a centralized place – Distributed memory => distributed directory for scalability (avoids bottlenecks) – Send point-to-point requests to processors via network – Scales better than Snooping – Actually existed BEFORE Snooping-based schemes
  • 15. 15 Basic Snoopy Protocols • Write strategies – Write-through: memory is always up-to-date –Write-back: snoop in caches to find most recent copy There are two ways to maintain coherence requirements using snooping protocols • Write Invalidate Protocol – Multiple readers, single writer – Write to shared data: an invalidate is sent to all caches which snoop and invalidate any copies • Read miss: further read will miss in the cache and fetch a new copy of the data • Write Broadcast/Update Protocol – Write to shared data: broadcast on bus, processors snoop, and update any copies – Read miss: memory/cache is always up-to-date • Write serialization: bus serializes requests! –Bus is single point of arbitration
  • 16. 16 Examples of Basic Snooping Protocols Assume neither cache initially holds X and the value of X in memory is 0 Write Invalidate Write Update
  • 17. 17 An Example Snoopy Protocol Invalidation protocol, write-back cache • Each cache block is in one state (track these): – Shared : block can be read – OR Exclusive : cache has only copy, its writeable, and dirty – OR Invalid : block contains no data – an extra state bit (shared/exclusive) associated with a valid bit and a dirty bit for each block • Each block of memory is in one state: – Clean in all caches and up-to-date in memory (Shared) – OR Dirty in exactly one cache (Exclusive) – OR Not in any caches • Each processor snoops every address placed on the bus – If a processor finds that is has a dirty copy of the requested cache block, it provides that cache block in response to the read request
  • 18. 18 Cache Coherence Mechanism of the Example Placing a write miss on the bus when a write hits in the shared state ensures an exclusive copy (data not transferred)
  • 19. 19 Figure 6.11 State Transitions for Each Cache Block •CPU may read/write hit/miss to the block •May place write/read miss on bus •May receive read/write miss from bus Requests from CPU Requests from bus
  • 20. 20 Cache Coherence State Diagram Figure 6.10 and Figure 6.12 (CPU in black and bus in gray from Figure 6.11)
  • 21. 21 6.5 Distributed Shared-Memory Architectures Distributed shared-memory architectures • Separate memory per processor – Local or remote access via memory controller – The physical address space is statically distributed Coherence Problems • Simple approach: uncacheable – shared data are marked as uncacheable and only private data are kept in caches – very long latency to access memory for shared data • Alternative: directory for memory blocks – The directory per memory tracks state of every block in every cache • which caches have a copies of the memory block, dirty vs. clean, ... – Two additional complications • The interconnect cannot be used as a single point of arbitration like the bus • Because the interconnect is message oriented, many messages must have explicit responses
  • 22. 22 Distributed Directory Multiprocessor To prevent directory becoming the bottleneck, we distribute directory entries with memory, each keeping track of which processors have copies of their memory blocks
  • 23. 23 Directory Protocols • Similar to Snoopy Protocol: Three states – Shared: 1 or more processors have the block cached, and the value in memory is up-to-date (as well as in all the caches) – Uncached: no processor has a copy of the cache block (not valid in any cache) – Exclusive: Exactly one processor has a copy of the cache block, and it has written the block, so the memory copy is out of date • The processor is called the owner of the block • In addition to tracking the state of each cache block, we must track the processors that have copies of the block when it is shared (usually a bit vector for each memory block: 1 if processor has copy) • Keep it simple(r): – Writes to non-exclusive data => write miss – Processor blocks until access completes – Assume messages received and acted upon in order sent
  • 24. 24 Messages for Directory Protocols •local node: the node where a request originates •home node: the node where the memory location and directory entry of an address reside •remote node: the node that has a copy of a cache block (exclusive or shared)
  • 25. 25 State Transition Diagram for Individual Cache Block • Comparing to snooping protocols: – identical states – stimulus is almost identical – write a shared cache block is treated as a write miss (without fetch the block) – cache block must be in exclusive state when it is written – any shared block must be up to date in memory • write miss: data fetch and selective invalidate operations sent by the directory controller (broadcast in snooping protocols)
  • 26. 26 State Transition Diagram for the Directory Figure 6.29 Transition diagram for cache block Three requests: read miss, write miss and data write back
  • 27. 27 Directory Operations: Requests and Actions • Message sent to directory causes two actions: – Update the directory – More messages to satisfy request • Block is in Uncached state: the copy in memory is the current value; only possible requests for that block are: – Read miss: requesting processor sent data from memory &requestor made only sharing node; state of block made Shared. – Write miss: requesting processor is sent the value & becomes the Sharing node. The block is made Exclusive to indicate that the only valid copy is cached. Sharers indicates the identity of the owner. • Block is Shared => the memory value is up-to-date: – Read miss: requesting processor is sent back the data from memory & requesting processor is added to the sharing set. – Write miss: requesting processor is sent the value. All processors in the set Sharers are sent invalidate messages, & Sharers is set to identity of requesting processor. The state of the block is made Exclusive.
  • 28. 28 Directory Operations: Requests and Actions(cont.) • Block is Exclusive: current value of the block is held in the cache of the processor identified by the set Sharers (the owner) => three possible directory requests: – Read miss: owner processor sent data fetch message, causing state of block in owner’s cache to transition to Shared and causes owner to send data to directory, where it is written to memory & sent back to requesting processor. Identity of requesting processor is added to set Sharers, which still contains the identity of the processor that was the owner (since it still has a readable copy). State is shared. – Data write-back: owner processor is replacing the block and hence must write it back, making memory copy up-to-date (the home directory essentially becomes the owner), the block is now Uncached, and the Sharer set is empty. – Write miss: block has a new owner. A message is sent to old owner causing the cache to send the value of the block to the directory from which it is sent to the requesting processor, which becomes the new owner. Sharers is set to identity of new owner, and state of block is made Exclusive.
  • 29. 29 Summary Chapter 6. Multiprocessors and Thread-Level Parallelism 6.1 Introduction 6.2 Characteristics of Application Domains 6.3 Symmetric Shared-Memory Architectures 6.4 Performance of Symmetric Shared-Memory Multiprocessors 6.5 Distributed Shared-Memory Architectures 6.6 Performance of Distributed Shared-Memory Multiprocessors 6.7 Synchronization 6.8 Models of Memory Consistency: An Introduction 6.9 Multithreading: Exploiting Thread-Level Parallelism within a Processor