SlideShare a Scribd company logo
1 of 20
ADVANCED COMPUTER ARCHITECTURE
AND PARALLEL PROCESSING
Shared memory architecture
1- Shared memory systems form a major category of multiprocessors.
2- In this category all processors share a global memory.
3- Communication between tasks running on different processors is performed
through writing to and reading from the global memory.
4- All interprocessor coordination and synchronization is also accomplished via
the global memory.
5- A shared memory computer system consists of :
- a set of independent processors.
- a set of memory modules, and an interconnection network as shown in
following Figure 4.1
6- Two main problems need to be addressed when designing a shared memory
system:
• performance degradation due to contention, and
• coherence problems
2
Shared memory architecture cont.
3
Shared memory architecture cont
• Performance degradation might happen when multiple processors are trying to
access the shared memory simultaneously. ?
• Caches : having multiple copies of data, spread throughout the caches might lead
to a coherence problem
- The copies in the caches are coherent if they are all equal to the same value.
- If one of the processors writes over the value of one of the copies, then the copy
becomes inconsistent because it no longer equals the value of the other copies.
4
4.1 CLASSIFICATION OF SHARED
MEMORY SYSTEMS
• The simplest shared memory system consists of one memory module (M)
that can be accessed from two processors (P1 and P2).
• An arbitration unit within the memory module passes requests through to a
memory controller. If the memory module is not busy and a single request
arrives, then the arbitration unit passes that request to the memory
controller and the request is satisfied. The module is placed in the busy
state while a request is being serviced. If a new request arrives while the
memory is busy servicing a previous request, the memory module sends a
wait signal, through the memory controller, to the processor making the new
request. 5
4.1 CLASSIFICATION OF SHARED
MEMORY SYSTEMS cont.
• In response, the requesting processor may hold its request on the line until the
memory becomes free or it may repeat its request some time later. If the
arbitration unit receives two requests, it selects one of them and passes it to the
memory controller. Again, the denied request can be either held to be served
next or it may be repeated some time later.
• Categories of shared memory systems Based on the interconnection network:
 Uniform Memory Access (UMA) .
 Non-uniform Memory Access (NUMA)
 Cache Only Memory Architecture (COMA)
6
Uniform Memory Access (UMA)
 In the UMA system a shared memory is accessible by all processors through an
interconnection network in the same way a single processor accesses its
memory.
 All processors have equal access time to any memory location.
 The interconnection network used: a single bus, multiple buses, or a crossbar
switch(diagonal,straight).
 Called it SMP (symmetric multiprocessor) systems?
 access to shared memory is balanced (Each processor has equal opportunity to
read/write to memory, including equal access speed).
7
Nonuniform Memory Access
(NUMA)
• each processor has part of the shared memory attached. The memory has a
single address space. Therefore, any processor could access any memory
location directly using its real address. However, the access time to modules
depends on the distance to the processor.
• the tree and the hierarchical bus networks are architectures used to interconnect
processors.
8
Cache-Only Memory Architecture
(COMA)
• Similar to the NUMA, each processor has part of the shared memory in the
COMA. However, in this case the shared memory consists of cache memory.
• requires that data be migrated to the processor requesting it.
• There is no memory hierarchy and the address space is made of all the caches.
There is a cache directory (D) that helps in remote cache access.
9
4.2 BUS-BASED SYMMETRIC
MULTIPROCESSORS
 Shared memory systems can be designed using :
• bus-based (simplest network for shared memory systems).
• switch-based interconnection networks (message passing).
 The bus/cache architecture alleviates the need for expensive multi-ported
memories and interface circuitry as well as the need to adopt a message-
passing paradigm when developing application software.
 the bus may get saturated if multiple processors are trying to access the shared
memory (via the bus) simultaneously. (contention problem) solve uses Caches
 High speed caches connected to each processor on one side and the bus on the
other side mean that local copies of instructions and data can be supplied at the
highest possible rate.
 If the local processor finds all of its instructions and data in the local cache hit
rate is 100%, otherwise miss rate of a cache, so must be copied from the
global memory, across the bus, into the cache, and then passed on to the local
processor. 10
4.2 BUS-BASED SYMMETRIC
MULTIPROCESSORS cont.
 One of the goals of the cache is to maintain a high hit rate, or low miss rate
under high processor loads. A high hit rate means the processors are not using
the bus as much. Hit rates are determined by a number of factors, ranging from
the application programs being run to the manner in which cache hardware is
implemented.
11
4.2 BUS-BASED SYMMETRIC
MULTIPROCESSORS cont.
• We define the variables for hit rate, number of processors, processor speed, bus
speed, and processor duty cycle rates as follows:
 N = number of processors;
 h = hit rate of each cache, assumed to be the same for all caches;
 (1 - h) = miss rate of all caches;
 B = bandwidth of the bus, measured in cycles/second;
 I = processor duty cycle, assumed to be identical for all processors, in
fetches/cycle; and
 V = peak processor speed, in fetches/second
12
4.2 BUS-BASED SYMMETRIC
MULTIPROCESSORS cont.
• The effective bandwidth of the bus is BI fetches/second.
If each processor is running at a speed of V, then misses are being generated
at a rate of V(1 - h). For an N-processor system, misses are simultaneously
being generated at a rate of N(1 - h)V. This leads to saturation of the bus when
N processors simultaneously try to access the bus. That is, 𝑵 𝟏 − 𝒉 𝑽 ≤ 𝑩𝑰.
The maximum number of processors with cache memories that the bus can
support is given by the relation,
𝑁 ≤
𝐵𝐼
1−ℎ 𝑉
13
4.2 BUS-BASED SYMMETRIC
MULTIPROCESSORS cont.
• Example 1
Suppose a shared memory system is constructed from processors that can
execute V = 107 instructions/s and the processor duty cycle I = 1. The caches
are designed to support a hit rate of 97%, and the bus supports a peak
bandwidth of B = 106 cycles/s. Then, (1 - h) = 0.03, and the maximum number
of processors N Is 𝑁 ≤
106∗1
(0.03∗107)
=3.33.
Thus, the system we have in mind can support only three processors!
We might ask what hit rate is needed to support a 30-processor system. In this
case, h = 1- BI/NV = 1 - (106(1))/((30)(107)) = 1 - 1/300, so for the system we
have in mind, h = 0.9967. Increasing h by 2.8%. results in supporting a factor of
ten more processors.
14
4.3 BASIC CACHE COHERENCY
METHODS
• Multiple copies of data, spread throughout the caches, lead to a coherence
problem among the caches. The copies in the caches are coherent if they all
equal the same value. However, if one of the processors writes over the value of
one of the copies, then the copy becomes inconsistent because it no longer
equals the value of the other copies. If data are allowed to become inconsistent
(incoherent), incorrect results will be propagated through the system, leading to
incorrect final results. Cache coherence algorithms are needed to maintain a
level of consistency throughout the parallel system.
Copies data in the
caches
coherence inconsistent
15
Cache–Memory Coherence
 In a single cache system, coherence between memory and the cache is
maintained using one of two policies:
• write-through:
The memory is updated every time the cache is updated.
• write-back:
The memory is updated only when the block in the cache is being
replaced.
16
Cache–Memory Coherence
17
Cache–Cache Coherence
• In multiprocessing system, when a task running on processor P requests the data in
global memory location X, for example, the contents of X are copied to processor P’s
local cache, where it is passed on to P. Now, suppose processor Q also accesses X.
What happens if Q wants to write a new value over the old value of X?
 There are two fundamental cache coherence policies:
 write-invalidate
Maintains consistency by reading from local caches until a write occurs.
 write-update
Maintains consistency by immediately updating all copies in all caches.
18
Cache–Cache Coherence
19
Shared Memory System
Coherence
• The four combinations to maintain coherence among all caches and
global memory are:
 Write-update and write-through
 Write-update and write-back
 Write-invalidate and write-through
 Write-invalidate and write-back
• If we permit a write-update and write-through directly on global memory
location X, the bus would start to get busy and ultimately all processors
would be idle while waiting for writes to complete. In write-update and write-
back, only copies in all caches are updated. On the contrary, if the write is
limited to the copy of X in cache Q, the caches become inconsistent on X.
Setting the dirty bit prevents the spread of inconsistent values of X, but at
some point, the inconsistent copies must be updated.
20

More Related Content

What's hot

Multi core-architecture
Multi core-architectureMulti core-architecture
Multi core-architecture
Piyush Mittal
 
distributed shared memory
 distributed shared memory distributed shared memory
distributed shared memory
Ashish Kumar
 

What's hot (20)

program flow mechanisms, advanced computer architecture
program flow mechanisms, advanced computer architectureprogram flow mechanisms, advanced computer architecture
program flow mechanisms, advanced computer architecture
 
message passing vs shared memory
message passing vs shared memorymessage passing vs shared memory
message passing vs shared memory
 
Distributed Computing ppt
Distributed Computing pptDistributed Computing ppt
Distributed Computing ppt
 
Pram model
Pram modelPram model
Pram model
 
Multi core-architecture
Multi core-architectureMulti core-architecture
Multi core-architecture
 
Parallelism
ParallelismParallelism
Parallelism
 
CS9222 ADVANCED OPERATING SYSTEMS
CS9222 ADVANCED OPERATING SYSTEMSCS9222 ADVANCED OPERATING SYSTEMS
CS9222 ADVANCED OPERATING SYSTEMS
 
multiprocessors and multicomputers
 multiprocessors and multicomputers multiprocessors and multicomputers
multiprocessors and multicomputers
 
Parallel Processors (SIMD)
Parallel Processors (SIMD) Parallel Processors (SIMD)
Parallel Processors (SIMD)
 
Distributed Operating System_1
Distributed Operating System_1Distributed Operating System_1
Distributed Operating System_1
 
Presentation on flynn’s classification
Presentation on flynn’s classificationPresentation on flynn’s classification
Presentation on flynn’s classification
 
Distributed system
Distributed systemDistributed system
Distributed system
 
Course outline of parallel and distributed computing
Course outline of parallel and distributed computingCourse outline of parallel and distributed computing
Course outline of parallel and distributed computing
 
Introduction to Parallel and Distributed Computing
Introduction to Parallel and Distributed ComputingIntroduction to Parallel and Distributed Computing
Introduction to Parallel and Distributed Computing
 
Multi Processors And Multi Computers
 Multi Processors And Multi Computers Multi Processors And Multi Computers
Multi Processors And Multi Computers
 
distributed shared memory
 distributed shared memory distributed shared memory
distributed shared memory
 
Operating System-Process Scheduling
Operating System-Process SchedulingOperating System-Process Scheduling
Operating System-Process Scheduling
 
Kernel (OS)
Kernel (OS)Kernel (OS)
Kernel (OS)
 
Distributed operating system(os)
Distributed operating system(os)Distributed operating system(os)
Distributed operating system(os)
 
Lecture 3 threads
Lecture 3   threadsLecture 3   threads
Lecture 3 threads
 

Similar to ADVANCED COMPUTER ARCHITECTURE AND PARALLEL PROCESSING

247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
ssuser5c9d4b1
 
assignment_presentaion_jhvvnvhjhbhjhvjh.pptx
assignment_presentaion_jhvvnvhjhbhjhvjh.pptxassignment_presentaion_jhvvnvhjhbhjhvjh.pptx
assignment_presentaion_jhvvnvhjhbhjhvjh.pptx
23mu36
 
Lecture 6
Lecture  6Lecture  6
Lecture 6
Mr SMAK
 
Lecture 6
Lecture  6Lecture  6
Lecture 6
Mr SMAK
 
Lecture 6
Lecture  6Lecture  6
Lecture 6
Mr SMAK
 
Module2 MultiThreads.ppt
Module2 MultiThreads.pptModule2 MultiThreads.ppt
Module2 MultiThreads.ppt
shreesha16
 

Similar to ADVANCED COMPUTER ARCHITECTURE AND PARALLEL PROCESSING (20)

Week5
Week5Week5
Week5
 
22CS201 COA
22CS201 COA22CS201 COA
22CS201 COA
 
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
 
NUMA
NUMANUMA
NUMA
 
assignment_presentaion_jhvvnvhjhbhjhvjh.pptx
assignment_presentaion_jhvvnvhjhbhjhvjh.pptxassignment_presentaion_jhvvnvhjhbhjhvjh.pptx
assignment_presentaion_jhvvnvhjhbhjhvjh.pptx
 
OS_MD_4.pdf
OS_MD_4.pdfOS_MD_4.pdf
OS_MD_4.pdf
 
Multiprocessor structures
Multiprocessor structuresMultiprocessor structures
Multiprocessor structures
 
Chapter 10
Chapter 10Chapter 10
Chapter 10
 
Lecture 6
Lecture  6Lecture  6
Lecture 6
 
Lecture 6
Lecture  6Lecture  6
Lecture 6
 
Lecture 6
Lecture  6Lecture  6
Lecture 6
 
Distributed system lectures
Distributed system lecturesDistributed system lectures
Distributed system lectures
 
Cache Coherence.pptx
Cache Coherence.pptxCache Coherence.pptx
Cache Coherence.pptx
 
Intro_ppt.pptx
Intro_ppt.pptxIntro_ppt.pptx
Intro_ppt.pptx
 
Memory and Cache Coherence in Multiprocessor System.pdf
Memory and Cache Coherence in Multiprocessor System.pdfMemory and Cache Coherence in Multiprocessor System.pdf
Memory and Cache Coherence in Multiprocessor System.pdf
 
Unit 5 lect-3-multiprocessor
Unit 5 lect-3-multiprocessorUnit 5 lect-3-multiprocessor
Unit 5 lect-3-multiprocessor
 
Coa presentation5
Coa presentation5Coa presentation5
Coa presentation5
 
Symmetric multiprocessing and Microkernel
Symmetric multiprocessing and MicrokernelSymmetric multiprocessing and Microkernel
Symmetric multiprocessing and Microkernel
 
Mesi
MesiMesi
Mesi
 
Module2 MultiThreads.ppt
Module2 MultiThreads.pptModule2 MultiThreads.ppt
Module2 MultiThreads.ppt
 

Recently uploaded

TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 

Recently uploaded (20)

Modernizing Legacy Systems Using Ballerina
Modernizing Legacy Systems Using BallerinaModernizing Legacy Systems Using Ballerina
Modernizing Legacy Systems Using Ballerina
 
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
 
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...
 
Introduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMIntroduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDM
 
Decarbonising Commercial Real Estate: The Role of Operational Performance
Decarbonising Commercial Real Estate: The Role of Operational PerformanceDecarbonising Commercial Real Estate: The Role of Operational Performance
Decarbonising Commercial Real Estate: The Role of Operational Performance
 
ChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps ProductivityChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps Productivity
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
JavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate GuideJavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate Guide
 
Stronger Together: Developing an Organizational Strategy for Accessible Desig...
Stronger Together: Developing an Organizational Strategy for Accessible Desig...Stronger Together: Developing an Organizational Strategy for Accessible Desig...
Stronger Together: Developing an Organizational Strategy for Accessible Desig...
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 

ADVANCED COMPUTER ARCHITECTURE AND PARALLEL PROCESSING

  • 2. Shared memory architecture 1- Shared memory systems form a major category of multiprocessors. 2- In this category all processors share a global memory. 3- Communication between tasks running on different processors is performed through writing to and reading from the global memory. 4- All interprocessor coordination and synchronization is also accomplished via the global memory. 5- A shared memory computer system consists of : - a set of independent processors. - a set of memory modules, and an interconnection network as shown in following Figure 4.1 6- Two main problems need to be addressed when designing a shared memory system: • performance degradation due to contention, and • coherence problems 2
  • 4. Shared memory architecture cont • Performance degradation might happen when multiple processors are trying to access the shared memory simultaneously. ? • Caches : having multiple copies of data, spread throughout the caches might lead to a coherence problem - The copies in the caches are coherent if they are all equal to the same value. - If one of the processors writes over the value of one of the copies, then the copy becomes inconsistent because it no longer equals the value of the other copies. 4
  • 5. 4.1 CLASSIFICATION OF SHARED MEMORY SYSTEMS • The simplest shared memory system consists of one memory module (M) that can be accessed from two processors (P1 and P2). • An arbitration unit within the memory module passes requests through to a memory controller. If the memory module is not busy and a single request arrives, then the arbitration unit passes that request to the memory controller and the request is satisfied. The module is placed in the busy state while a request is being serviced. If a new request arrives while the memory is busy servicing a previous request, the memory module sends a wait signal, through the memory controller, to the processor making the new request. 5
  • 6. 4.1 CLASSIFICATION OF SHARED MEMORY SYSTEMS cont. • In response, the requesting processor may hold its request on the line until the memory becomes free or it may repeat its request some time later. If the arbitration unit receives two requests, it selects one of them and passes it to the memory controller. Again, the denied request can be either held to be served next or it may be repeated some time later. • Categories of shared memory systems Based on the interconnection network:  Uniform Memory Access (UMA) .  Non-uniform Memory Access (NUMA)  Cache Only Memory Architecture (COMA) 6
  • 7. Uniform Memory Access (UMA)  In the UMA system a shared memory is accessible by all processors through an interconnection network in the same way a single processor accesses its memory.  All processors have equal access time to any memory location.  The interconnection network used: a single bus, multiple buses, or a crossbar switch(diagonal,straight).  Called it SMP (symmetric multiprocessor) systems?  access to shared memory is balanced (Each processor has equal opportunity to read/write to memory, including equal access speed). 7
  • 8. Nonuniform Memory Access (NUMA) • each processor has part of the shared memory attached. The memory has a single address space. Therefore, any processor could access any memory location directly using its real address. However, the access time to modules depends on the distance to the processor. • the tree and the hierarchical bus networks are architectures used to interconnect processors. 8
  • 9. Cache-Only Memory Architecture (COMA) • Similar to the NUMA, each processor has part of the shared memory in the COMA. However, in this case the shared memory consists of cache memory. • requires that data be migrated to the processor requesting it. • There is no memory hierarchy and the address space is made of all the caches. There is a cache directory (D) that helps in remote cache access. 9
  • 10. 4.2 BUS-BASED SYMMETRIC MULTIPROCESSORS  Shared memory systems can be designed using : • bus-based (simplest network for shared memory systems). • switch-based interconnection networks (message passing).  The bus/cache architecture alleviates the need for expensive multi-ported memories and interface circuitry as well as the need to adopt a message- passing paradigm when developing application software.  the bus may get saturated if multiple processors are trying to access the shared memory (via the bus) simultaneously. (contention problem) solve uses Caches  High speed caches connected to each processor on one side and the bus on the other side mean that local copies of instructions and data can be supplied at the highest possible rate.  If the local processor finds all of its instructions and data in the local cache hit rate is 100%, otherwise miss rate of a cache, so must be copied from the global memory, across the bus, into the cache, and then passed on to the local processor. 10
  • 11. 4.2 BUS-BASED SYMMETRIC MULTIPROCESSORS cont.  One of the goals of the cache is to maintain a high hit rate, or low miss rate under high processor loads. A high hit rate means the processors are not using the bus as much. Hit rates are determined by a number of factors, ranging from the application programs being run to the manner in which cache hardware is implemented. 11
  • 12. 4.2 BUS-BASED SYMMETRIC MULTIPROCESSORS cont. • We define the variables for hit rate, number of processors, processor speed, bus speed, and processor duty cycle rates as follows:  N = number of processors;  h = hit rate of each cache, assumed to be the same for all caches;  (1 - h) = miss rate of all caches;  B = bandwidth of the bus, measured in cycles/second;  I = processor duty cycle, assumed to be identical for all processors, in fetches/cycle; and  V = peak processor speed, in fetches/second 12
  • 13. 4.2 BUS-BASED SYMMETRIC MULTIPROCESSORS cont. • The effective bandwidth of the bus is BI fetches/second. If each processor is running at a speed of V, then misses are being generated at a rate of V(1 - h). For an N-processor system, misses are simultaneously being generated at a rate of N(1 - h)V. This leads to saturation of the bus when N processors simultaneously try to access the bus. That is, 𝑵 𝟏 − 𝒉 𝑽 ≤ 𝑩𝑰. The maximum number of processors with cache memories that the bus can support is given by the relation, 𝑁 ≤ 𝐵𝐼 1−ℎ 𝑉 13
  • 14. 4.2 BUS-BASED SYMMETRIC MULTIPROCESSORS cont. • Example 1 Suppose a shared memory system is constructed from processors that can execute V = 107 instructions/s and the processor duty cycle I = 1. The caches are designed to support a hit rate of 97%, and the bus supports a peak bandwidth of B = 106 cycles/s. Then, (1 - h) = 0.03, and the maximum number of processors N Is 𝑁 ≤ 106∗1 (0.03∗107) =3.33. Thus, the system we have in mind can support only three processors! We might ask what hit rate is needed to support a 30-processor system. In this case, h = 1- BI/NV = 1 - (106(1))/((30)(107)) = 1 - 1/300, so for the system we have in mind, h = 0.9967. Increasing h by 2.8%. results in supporting a factor of ten more processors. 14
  • 15. 4.3 BASIC CACHE COHERENCY METHODS • Multiple copies of data, spread throughout the caches, lead to a coherence problem among the caches. The copies in the caches are coherent if they all equal the same value. However, if one of the processors writes over the value of one of the copies, then the copy becomes inconsistent because it no longer equals the value of the other copies. If data are allowed to become inconsistent (incoherent), incorrect results will be propagated through the system, leading to incorrect final results. Cache coherence algorithms are needed to maintain a level of consistency throughout the parallel system. Copies data in the caches coherence inconsistent 15
  • 16. Cache–Memory Coherence  In a single cache system, coherence between memory and the cache is maintained using one of two policies: • write-through: The memory is updated every time the cache is updated. • write-back: The memory is updated only when the block in the cache is being replaced. 16
  • 18. Cache–Cache Coherence • In multiprocessing system, when a task running on processor P requests the data in global memory location X, for example, the contents of X are copied to processor P’s local cache, where it is passed on to P. Now, suppose processor Q also accesses X. What happens if Q wants to write a new value over the old value of X?  There are two fundamental cache coherence policies:  write-invalidate Maintains consistency by reading from local caches until a write occurs.  write-update Maintains consistency by immediately updating all copies in all caches. 18
  • 20. Shared Memory System Coherence • The four combinations to maintain coherence among all caches and global memory are:  Write-update and write-through  Write-update and write-back  Write-invalidate and write-through  Write-invalidate and write-back • If we permit a write-update and write-through directly on global memory location X, the bus would start to get busy and ultimately all processors would be idle while waiting for writes to complete. In write-update and write- back, only copies in all caches are updated. On the contrary, if the write is limited to the copy of X in cache Q, the caches become inconsistent on X. Setting the dirty bit prevents the spread of inconsistent values of X, but at some point, the inconsistent copies must be updated. 20