This chapter discusses shared memory architecture and classifications of shared memory systems. It describes Uniform Memory Access (UMA), Non-Uniform Memory Access (NUMA), and Cache Only Memory Architecture (COMA). It also covers basic cache coherency methods like write-through, write-back, write-invalidate, and write-update. Finally, it discusses snooping protocols and cache coherency techniques used in shared memory systems.
Parallel computing is a type of computation in which many calculations or the execution of processes are carried out simultaneously. Large problems can often be divided into smaller ones, which can then be solved at the same time. There are several different forms of parallel computing: bit-level, instruction-level, data, and task parallelism. Parallelism has been employed for many years, mainly in high-performance computing, but interest in it has grown lately due to the physical constraints preventing frequency scaling. As power consumption (and consequently heat generation) by computers has become a concern in recent years, parallel computing has become the dominant paradigm in computer architecture, mainly in the form of multi-core processors.
Parallel computing is a type of computation in which many calculations or the execution of processes are carried out simultaneously. Large problems can often be divided into smaller ones, which can then be solved at the same time. There are several different forms of parallel computing: bit-level, instruction-level, data, and task parallelism. Parallelism has been employed for many years, mainly in high-performance computing, but interest in it has grown lately due to the physical constraints preventing frequency scaling. As power consumption (and consequently heat generation) by computers has become a concern in recent years, parallel computing has become the dominant paradigm in computer architecture, mainly in the form of multi-core processors.
Parallel computing is computing architecture paradigm ., in which processing required to solve a problem is done in more than one processor parallel way.
In this presentation, you will learn the fundamentals of Multi Processors and Multi Computers in only a few minutes.
Meanings, features, attributes, applications, and examples of multiprocessors and multi computers.
So, let's get started. If you enjoy this and find the information beneficial, please like and share it with your friends.
MODULE III Parallel Processors and Memory Organization 15 Hours
Parallel Processors: Introduction to parallel processors, Concurrent access to memory and cache
coherency. Introduction to multicore architecture. Memory system design: semiconductor memory
technologies, memory organization. Memory interleaving, concept of hierarchical memory
organization, cache memory, cache size vs. block size, mapping functions, replacement
algorithms, write policies.
Case Study: Instruction sets of some common CPUs - Design of a simple hypothetical CPU- A
sequential Y86-64 design-Sun Ultra SPARC II pipeline structure
Parallel computing is computing architecture paradigm ., in which processing required to solve a problem is done in more than one processor parallel way.
In this presentation, you will learn the fundamentals of Multi Processors and Multi Computers in only a few minutes.
Meanings, features, attributes, applications, and examples of multiprocessors and multi computers.
So, let's get started. If you enjoy this and find the information beneficial, please like and share it with your friends.
MODULE III Parallel Processors and Memory Organization 15 Hours
Parallel Processors: Introduction to parallel processors, Concurrent access to memory and cache
coherency. Introduction to multicore architecture. Memory system design: semiconductor memory
technologies, memory organization. Memory interleaving, concept of hierarchical memory
organization, cache memory, cache size vs. block size, mapping functions, replacement
algorithms, write policies.
Case Study: Instruction sets of some common CPUs - Design of a simple hypothetical CPU- A
sequential Y86-64 design-Sun Ultra SPARC II pipeline structure
For over 40 years, virtually all computers have followed a common machine model known as the von Neumann computer. Name after the Hungarian mathématicien John von Neumann.
A von Neumann computer uses the stored-program concept. The CPU executes a stored program that specifies a sequence of read and write operations on the memory.
Caches in multiprocessing environment introduce the Cache Coherence problem.
When multiple processors maintain locally cached copies of a unique shared memory location, any local modification of the location can result in a globally inconsistent view of memory. This is called Cache Coherence Problem.
A brief discussion about its solutions are given.
Dr. Sean Tan, Head of Data Science, Changi Airport Group
Discover how Changi Airport Group (CAG) leverages graph technologies and generative AI to revolutionize their search capabilities. This session delves into the unique search needs of CAG’s diverse passengers and customers, showcasing how graph data structures enhance the accuracy and relevance of AI-generated search results, mitigating the risk of “hallucinations” and improving the overall customer journey.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
UiPath Test Automation using UiPath Test Suite series, part 6DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 6. In this session, we will cover Test Automation with generative AI and Open AI.
UiPath Test Automation with generative AI and Open AI webinar offers an in-depth exploration of leveraging cutting-edge technologies for test automation within the UiPath platform. Attendees will delve into the integration of generative AI, a test automation solution, with Open AI advanced natural language processing capabilities.
Throughout the session, participants will discover how this synergy empowers testers to automate repetitive tasks, enhance testing accuracy, and expedite the software testing life cycle. Topics covered include the seamless integration process, practical use cases, and the benefits of harnessing AI-driven automation for UiPath testing initiatives. By attending this webinar, testers, and automation professionals can gain valuable insights into harnessing the power of AI to optimize their test automation workflows within the UiPath ecosystem, ultimately driving efficiency and quality in software development processes.
What will you get from this session?
1. Insights into integrating generative AI.
2. Understanding how this integration enhances test automation within the UiPath platform
3. Practical demonstrations
4. Exploration of real-world use cases illustrating the benefits of AI-driven test automation for UiPath
Topics covered:
What is generative AI
Test Automation with generative AI and Open AI.
UiPath integration with generative AI
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Removing Uninteresting Bytes in Software FuzzingAftab Hussain
Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process.
In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds.
- These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
UiPath Test Automation using UiPath Test Suite series, part 5DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 5. In this session, we will cover CI/CD with devops.
Topics covered:
CI/CD with in UiPath
End-to-end overview of CI/CD pipeline with Azure devops
Speaker:
Lyndsey Byblow, Test Suite Sales Engineer @ UiPath, Inc.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Threats to mobile devices are more prevalent and increasing in scope and complexity. Users of mobile devices desire to take full advantage of the features
available on those devices, but many of the features provide convenience and capability but sacrifice security. This best practices guide outlines steps the users can take to better protect personal devices and information.
Sudheer Mechineni, Head of Application Frameworks, Standard Chartered Bank
Discover how Standard Chartered Bank harnessed the power of Neo4j to transform complex data access challenges into a dynamic, scalable graph database solution. This keynote will cover their journey from initial adoption to deploying a fully automated, enterprise-grade causal cluster, highlighting key strategies for modelling organisational changes and ensuring robust disaster recovery. Learn how these innovations have not only enhanced Standard Chartered Bank’s data infrastructure but also positioned them as pioneers in the banking sector’s adoption of graph technology.
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionAggregage
Join Maher Hanafi, VP of Engineering at Betterworks, in this new session where he'll share a practical framework to transform Gen AI prototypes into impactful products! He'll delve into the complexities of data collection and management, model selection and optimization, and ensuring security, scalability, and responsible use.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
How to Get CNIC Information System with Paksim Ga.pptxdanishmna97
Pakdata Cf is a groundbreaking system designed to streamline and facilitate access to CNIC information. This innovative platform leverages advanced technology to provide users with efficient and secure access to their CNIC details.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Maruthi Prithivirajan, Head of ASEAN & IN Solution Architecture, Neo4j
Get an inside look at the latest Neo4j innovations that enable relationship-driven intelligence at scale. Learn more about the newest cloud integrations and product enhancements that make Neo4j an essential choice for developers building apps with interconnected data and generative AI.
2. 4.1 Classification of Shared
Memory Systems
• Shared memory systems are multi-port
and categorized as follows:
– Uniform memory Access (UMA)
– Non-uniform Memory Access (NUMA)
– Cache Only Memory Architecture (COMA)
M
P1 P2
Advanced Computer Architecture and Parallel Processing Hesham El-Rewini & Mostafa Abd-El-Barr
3. 4.1 Classification of Shared
Memory Systems
• Uniform Memory Access (UMA)
– Shared memory is accessible by all processors through an
interconnection network in the same way a single processor
accesses its memory.
– All processors have equal access time to any memory location.
– Because access to the shared memory is balanced, these
systems are called SMP (symmetric multiprocessor)
M M M M
Bus
C C C C
P P P P
Advanced Computer Architecture and Parallel Processing Hesham El-Rewini & Mostafa Abd-El-Barr
4. 4.1 Classification of Shared
Memory Systems
• Non Uniform Memory Access (NUMA)
– Each processor has part of the shared memory attached.
– The access time to modules depend on the distance to the
processor, which results a non-uniform memory access time.
Interconnection Network
P P P P
M M M M
Advanced Computer Architecture and Parallel Processing Hesham El-Rewini & Mostafa Abd-El-Barr
5. 4.1 Classification of Shared
Memory Systems
• Cache-Only Memory Architecture (COMA)
– Like NUMA, each processor has part of the shared memory in
COMA.
– COMA requires the data to be migrated to the processor
requesting it.
Interconnection Network
D D D D
C C C C
P P P P
Advanced Computer Architecture and Parallel Processing Hesham El-Rewini & Mostafa Abd-El-Barr
6. 4.2 Bus-Based Symmetric
Multiprocessors
• Shared memory systems can be designed using
bus-based or switch-based interconnection
networks.
Global Memory P C
P C
C C C P C
P P P
P C
M M M M
Advanced Computer Architecture and Parallel Processing Hesham El-Rewini & Mostafa Abd-El-Barr
7. 4.3 Basic Cache Coherency
Methods
• Cache-memory coherence using two
policies:
– Write-Through:
• The memory is updated every time the cache is
updated.
– Write-Back:
• The memory is updated only when the block in the
cache is being replaced.
Advanced Computer Architecture and Parallel Processing Hesham El-Rewini & Mostafa Abd-El-Barr
8. 4.3 Basic Cache Coherency
Methods
• Cache-memory coherence using two
policies:
x Memory x’ Memory x Memory
x Cache x’ Cache x’ Cache
P P P
Before Write through Write back
Advanced Computer Architecture and Parallel Processing Hesham El-Rewini & Mostafa Abd-El-Barr
9. 4.3 Basic Cache Coherency
Methods
• Cache-cache coherence using two
policies:
– Write-Invalidate:
• Maintains consistency by reading from local
caches until a write occurs.
– Write Update:
• Maintains consistency by immediately updating all
copies in all caches.
Advanced Computer Architecture and Parallel Processing Hesham El-Rewini & Mostafa Abd-El-Barr
10. 4.3 Basic Cache Coherency
Methods
• Write-Invalidate:
x x’ x
x x x’ I x’ I
P1 P2 P3 P1 P2 P3 P1 P2 P3
Before Write Through Write back
Advanced Computer Architecture and Parallel Processing Hesham El-Rewini & Mostafa Abd-El-Barr
11. 4.3 Basic Cache Coherency
Methods
• Write-Update:
x x’ x
x x x’ x’ x’ x’
P1 P2 P3 P1 P2 P3 P1 P2 P3
Before Write Through Write back
Advanced Computer Architecture and Parallel Processing Hesham El-Rewini & Mostafa Abd-El-Barr
12. 4.3 Basic Cache Coherency
Methods
• Shared Memory System Coherence :
– Four combinations to maintain coherence
among all caches and global memory:
• Write-Update and Write-Through.
• Write-Update and Write-Back.
• Write-Invalidate and Write-Through.
• Write-Invalidate and Write-Back.
Advanced Computer Architecture and Parallel Processing Hesham El-Rewini & Mostafa Abd-El-Barr
13. 4.4 Snooping Protocols
• Based on watching bus activities and carry out the
appropriate coherency commands when necessary.
• Global memory is moved in blocks, and each block has a
state associated with it, which determines what happens
to the entire contents of the block.
• The state of a block might change as a result of the
operations Read-Miss, Read-Hit, Write-Miss, and Write-
Hit.
Advanced Computer Architecture and Parallel Processing Hesham El-Rewini & Mostafa Abd-El-Barr
14. 4.4 Snooping Protocols
• Write-Invalidate and Write-Through
– Multiple processors can read block copies from main
memory safely until one processor updates its copy.
– At this time, all cache copies are invalidated and the
memory is updated to remain consistent.
Advanced Computer Architecture and Parallel Processing Hesham El-Rewini & Mostafa Abd-El-Barr
15. 4.4 Snooping Protocols
• Write-Invalidate and Write-Through
State Description
Valid The copy is consistent with global memory
[VALID]
Invalid The copy is inconsistent
[INV]
Advanced Computer Architecture and Parallel Processing Hesham El-Rewini & Mostafa Abd-El-Barr
16. 4.4 Snooping Protocols
• Write-Invalidate and Write-Through
Event Actions
Read Hit Use the local copy from the cache.
Read Miss Fetch a copy from global memory. Set the state of this
copy to Valid.
Write Hit Perform the write locally. Broadcast an Invalid
command to all caches. Update the global memory.
Write Get a copy from global memory. Broadcast an invalid
Miss command to all caches. Update the global memory.
Update the local copy and set its state to Valid.
Replace Since memory is always consistent, no write back is
needed when a block is replaced.
Advanced Computer Architecture and Parallel Processing Hesham El-Rewini & Mostafa Abd-El-Barr
17. 4.4 Snooping Protocols
• Write-Invalidate and Write-Back (Ownership
Protocol)
– A valid block can be owned by memory and shared in multiple
caches that can contain only the shared copies of the block.
– Multiple processors can safely read these blocks from their
caches until one processor updates its copy.
– At this time, the writer becomes the only owner of the valid block
and all other copies are invalidated.
Advanced Computer Architecture and Parallel Processing Hesham El-Rewini & Mostafa Abd-El-Barr
18. 4.4 Snooping Protocols
• Write-Invalidate and Write-Back
(Ownership Protocol)
State Description
Shared Data is valid and can be read safely. Multiple copies
(Read-Only) can be in this state
[RO]
Exclusive Only one valid cache copy exists and can be
(Read- read from and written to safely. Copies in other
Write) [RW] caches are invalid
Invalid The copy is inconsistent
[INV]
Advanced Computer Architecture and Parallel Processing Hesham El-Rewini & Mostafa Abd-El-Barr
19. 4.4 Snooping Protocols
• Write-Invalidate and Write-Back
Event Action
Read Hit Use the local copy from the cache.
Read Miss: If no Exclusive (Read-Write) copy exists, then
supply a copy from global memory. Set the state of
this copy to Shared (Read-Only). If an Exclusive
(Read-Write) copy exists, make a copy from the
cache that set the state to Exclusive (Read-Write),
update global memory and local cache with the copy.
Set the state to Shared (Read-Only) in both caches.
Advanced Computer Architecture and Parallel Processing Hesham El-Rewini & Mostafa Abd-El-Barr
20. 4.4 Snooping Protocols
• Write-Invalidate and Write-Back
Write Hit If the copy is Exclusive (Read-Write), perform the write
locally. If the state is Shared (Read-Only), then
broadcast an Invalid to all caches. Set the state to
Exclusive (Read-Write).
Write Miss Get a copy from either a cache with an Exclusive (Read-
Write) copy, or from global memory itself. Broadcast an
Invalid command to all caches. Update the local copy
and set its state to Exclusive (Read-Write).
Block If a copy is in an Exclusive (Read-Write) state, it has to
Replacement be written back to main memory if the block is being
replaced. If the copy is in Invalid or Shared (Read-Only)
states, no write back is needed when a block is replaced.
Advanced Computer Architecture and Parallel Processing Hesham El-Rewini & Mostafa Abd-El-Barr
21. 4.4 Snooping Protocols
• Write-Once
– This write-invalidate protocol, which was proposed by
Goodman in 1983 uses a combination of write-through
and write-back.
– Write-through is used the very first time a block is
written. Subsequent writes are performed using write
back.
Advanced Computer Architecture and Parallel Processing Hesham El-Rewini & Mostafa Abd-El-Barr
22. 4.4 Snooping Protocols
• Write-Once
State Description
Invalid The copy is inconsistent.
[INV]
Valid The copy is consistent with global memory.
[VALID]
Reserved Data has been written exactly once and the copy is consistent with
[RES] global memory. There is only one copy of the global memory block
in one local cache.
Dirty Data has been updated more than once and there is only one copy
[DIRTY] in one local cache. When a copy is dirty, it must be written back to
global memory
Advanced Computer Architecture and Parallel Processing Hesham El-Rewini & Mostafa Abd-El-Barr
23. 4.4 Snooping Protocols
• Write-Once
Event Actions
Read Hit Use the local copy from the cache.
Read If no Dirty copy exists, then supply a copy from global
Miss memory. Set the state of this copy to Valid. If a dirty copy
exists, make a copy from the cache that set the state to
Dirty, update global memory and local cache with the
copy. Set the state to VALID in both caches.
Advanced Computer Architecture and Parallel Processing Hesham El-Rewini & Mostafa Abd-El-Barr
24. 4.4 Snooping Protocols
• Write-Once
Write Hit If the copy is Dirty or Reserved, perform the write locally,
and set the state to Dirty. If the state is Valid, then broadcast
an Invalid command to all caches. Update the global memory
and set the state to Reserved.
Write Get a copy from either a cache with a Dirty copy or from
Miss global memory itself. Broadcast an Invalid command to all
caches. Update the local copy and set its state to Dirty.
Block If a copy is in a Dirty state, it has to be written back to main
Replace memory if the block is being replaced. If the copy is in Valid,
ment Reserved, or Invalid states, no write back is needed when a
block is replaced.
Advanced Computer Architecture and Parallel Processing Hesham El-Rewini & Mostafa Abd-El-Barr
25. 4.4 Snooping Protocols
• Write-Update and Partial Write-Through
– In this protocol an update to one cache is written to memory at
the same time it is broadcast to other caches sharing the
updated block.
– These caches snoop on the bus and perform updates to their
local copies.
– There is also a special bus line, which is asserted to indicate that
at least one other cache is sharing the block.
Advanced Computer Architecture and Parallel Processing Hesham El-Rewini & Mostafa Abd-El-Barr
26. 4.4 Snooping Protocols
• Write-Update and Partial Write-Through
State Description
Valid This is the only cache copy and is consistent
Exclusive with global memory
[VAL-X]
Shared There are multiple caches copies shared. All copies
[SHARE] are consistent with memory
Dirty This copy is not shared by other caches and
[DIRTY] has been updated. It is not consistent with
global memory. (Copy ownership)
Advanced Computer Architecture and Parallel Processing Hesham El-Rewini & Mostafa Abd-El-Barr
27. 4.4 Snooping Protocols
• Write-Update and Partial Write-Through
Event Action
Read Hit Use the local copy from the cache. State does not change
Read Miss: If no other cache copy exists, then supply a copy from
global memory. Set the state of this copy to Valid
Exclusive. If a cache copy exists, make a copy from the
cache. Set the state to Shared in both caches. If the cache
copy was in a Dirty state, the value must also be written
to memory.
Advanced Computer Architecture and Parallel Processing Hesham El-Rewini & Mostafa Abd-El-Barr
28. 4.4 Snooping Protocols
• Write-Update and Partial Write-Through
Write Hit Perform the write locally and set the state to Dirty. If the state
is Shared, then broadcast data to memory and to all caches and
set the state to Shared. If other caches no longer share the
block, the state changes from Shared to Valid Exclusion.
Write Miss The block copy comes from either another cache or from global
memory. If the block comes from another cache, perform the
update and update all other caches that share the block and
global memory. Set the state to Shared. If the copy comes from
memory, perform the write and set the state to Dirty.
Block If a copy is in a Dirty state, it has to be written back to main
Replacement memory if the block is being replaced. If the copy is in Valid
Exclusive or Shared states, no write back is needed when a
block is replaced.
Advanced Computer Architecture and Parallel Processing Hesham El-Rewini & Mostafa Abd-El-Barr
29. 4.4 Snooping Protocols
• Write-Update and Write-Back
– This protocol is similar to the previous one
except that instead of writing through to the
memory whenever a shared block is updated,
memory updates are done only when the
block is being replaced.
Advanced Computer Architecture and Parallel Processing Hesham El-Rewini & Mostafa Abd-El-Barr
30. 4.4 Snooping Protocols
• Write-Update and Write-Back
State Description
Valid This is the only cache copy and is consistent with
Exclusive global memory
[VAL-X]
Shared There are multiple caches copies shared.
Clean
[SH-CLN]
Shared There are multiple shared caches copies. This is the last
Dirty one being updated. (Ownership)
[SH-DRT]
Dirty This copy is not shared by other caches and has
[DIRTY] been updated. It is not consistent with global
memory. (Ownership)
Advanced Computer Architecture and Parallel Processing Hesham El-Rewini & Mostafa Abd-El-Barr
31. 4.4 Snooping Protocols
• Write-Update and Write-Back
Event Action
Read Hit Use the local copy from the cache. State does not change
Read Miss: If no other cache copy exists, then supply a copy from global
memory. Set the state of this copy to Valid Exclusive. If a cache
copy exists, make a copy from the cache. Set the state to Shared
Clean. If the supplying cache copy was in a Valid Exclusion or
Shared Clean, its new state becomes Shared Clean. If the
supplying cache copy was in a Dirty or Shared Dirty state, its
new state becomes Shared Dirty.
Advanced Computer Architecture and Parallel Processing Hesham El-Rewini & Mostafa Abd-El-Barr
32. 4.4 Snooping Protocols
• Write-Update and Write-Back
Write Hit If the sate was Valid Exclusive or Dirty, Perform the write locally
and set the state to Dirty. If the state is Shared Clean or Shared
Dirty, perform update and change state to Shared Dirty.
Broadcast the updated block to all other caches. These caches
snoop the bus and update their copies and set their state to
Shared Clean.
Write Miss The block copy comes from either another cache or from global
memory. If the block comes from another cache, perform the
update, set the state to Shared Dirty, and broadcast the updated
block to all other caches. Other caches snoop the bus, update
their copies, and change their state to Shared Clean. If the copy
comes from memory, perform the write and set the state to Dirty.
Block If a copy is in a Dirty or Shared Dirty state, it has to be written
Replacement back to main memory if the block is being replaced. If the copy is
in Valid Exclusive, no write back is needed when a block is
replaced.
Advanced Computer Architecture and Parallel Processing Hesham El-Rewini & Mostafa Abd-El-Barr
33. 4.5 Directory Based Protocols
– Due to the nature of some interconnection networks
and the size of the shared memory system, updating
or invalidating caches using snoopy protocols might
become unpractical.
– Cache coherence protocols that somehow store
information on where copies of blocks reside are
called directory schemes.
Advanced Computer Architecture and Parallel Processing Hesham El-Rewini & Mostafa Abd-El-Barr
34. 4.5 Directory Based Protocols
– A directory is a data structure that maintains
information on the processors that share a memory
block and on its state.
– The information maintained in the directory could be
either centralized or distributed.
– A Central directory maintains information about all
blocks in a central data structure.
– The same information can be handled in a distributed
fashion by allowing each memory module to maintain
a separate directory.
Advanced Computer Architecture and Parallel Processing Hesham El-Rewini & Mostafa Abd-El-Barr
35. 4.5 Directory Based Protocols
• Protocol Categorization:
– Full Map Directories.
– Limited Directories.
– Chained Directories.
Advanced Computer Architecture and Parallel Processing Hesham El-Rewini & Mostafa Abd-El-Barr
36. 4.5 Directory Based Protocols
– Full-map Directories:
• Each directory entry contains N pointers, where N
is the number of processors.
• There could be N cached copies of a particular
block shared by all processors.
• For every memory block, an N bit vector is
maintained, where N equals the number of
processors in the shared memory system. Each bit
in the vector corresponds to one processor.
Advanced Computer Architecture and Parallel Processing Hesham El-Rewini & Mostafa Abd-El-Barr
37. 4.5 Directory Based Protocols
– Full-map Directories:
Memory Directory
X: 1 0 1 0 Data
Interconnection Network
X: Data X: Data
Cache C0 Cache C1 Cache C2 Cache C3
Advanced Computer Architecture and Parallel Processing Hesham El-Rewini & Mostafa Abd-El-Barr
38. 4.5 Directory Based Protocols
– Limited Directories:
• Fixed number of pointers per directory entry
regardless of the number of processors.
• Restricting the number of simultaneously cached
copies of any block should solve the directory size
problem that might exist in full-map directories.
Advanced Computer Architecture and Parallel Processing Hesham El-Rewini & Mostafa Abd-El-Barr
39. 4.5 Directory Based Protocols
– Limited Directories:
Memory Directory
X: C0 C2 Data
Interconnection Network
X: Data X: Data
Cache C0 Cache C1 Cache C2 Cache C3
Advanced Computer Architecture and Parallel Processing Hesham El-Rewini & Mostafa Abd-El-Barr
40. 4.5 Directory Based Protocols
– Chained Directories:
• Chained directories emulate full-map by
distributing the directory among the caches.
• Solving the directory size problem without
restricting the number of shared block copies.
• Chained directories keep track of shared copies of
a particular block by maintaining a chain of
directory pointers.
Advanced Computer Architecture and Parallel Processing Hesham El-Rewini & Mostafa Abd-El-Barr
41. 4.5 Directory Based Protocols
– Chained Directories:
Memory
Directory
X: C2 Data
Interconnection Network
X: CT Data X: C0 Data
Cache C0 Cache C1 Cache C2 Cache C3
Advanced Computer Architecture and Parallel Processing Hesham El-Rewini & Mostafa Abd-El-Barr
43. 4.5 Directory Based Protocols
• Centralized Directory Invalidate
– Invalidating signals and a pointer to the requesting
processor are forwarded to all processors that have a
copy of the block.
– Each invalidated cache sends an acknowledgment to
the requesting processor.
– After the invalidation is complete, only the writing
processor will have a cache with a copy of the block.
Advanced Computer Architecture and Parallel Processing Hesham El-Rewini & Mostafa Abd-El-Barr
44. 4.5 Directory Based Protocols
• Centralized Directory Invalidate – Write by
P3
Memory Directory
Write-reply
X: 1 0 1 0 Data
Invalidate &
requester
Invalidate & Write
requester inv-ack
X: Data X: Data
Cache C0 Cache C1 Cache C2 Cache C3
inv-ack
Advanced Computer Architecture and Parallel Processing Hesham El-Rewini & Mostafa Abd-El-Barr
45. 4.5 Directory Based Protocols
• Scalable Coherent Interface (SCI)
– Doubly linked list of distributed directories.
– Each cached block is entered into a list of processors
sharing that block.
– For every block address, the memory and cache
entries have additional tag bits. Part of the memory
tag identifies the first processor in the sharing list (the
head). Part of each cache tag identifies the previous
and following sharing list entries.
Advanced Computer Architecture and Parallel Processing Hesham El-Rewini & Mostafa Abd-El-Barr
46. 4.5 Directory Based Protocols
• Scalable Coherent Interface (SCI)
– Initially memory is in the uncached state and cached
copies are invalid.
– A read request is directed from a processor to the
memory controller. The requested data is returned to
the requester’s cache and its entry state is changed
from invalid to the head state.
– This changes the memory state from uncached to
cached.
Advanced Computer Architecture and Parallel Processing Hesham El-Rewini & Mostafa Abd-El-Barr
47. 4.5 Directory Based Protocols
• Scalable Coherent Interface (SCI)
– When a new requester directs its read request to
memory, the memory returns a pointer to the head.
– A cache-to-cache read request (called Prepend) is
sent from the requester to the head cache.
– On receiving the request, the head cache sets its
backward pointer to point to the requester’s cache.
– The requested data is returned to the requester’s
cache and its entry state is changed to the head state.
Advanced Computer Architecture and Parallel Processing Hesham El-Rewini & Mostafa Abd-El-Barr
48. 4.5 Directory Based Protocols
• Scalable Coherent Interface (SCI)-Sharing
list addition
Memory Before After
Memory
1) read
Cache C0 Cache C2
Cache C0 Cache C2
(head) (Invalid) (middle) (head)
2) prepend
Advanced Computer Architecture and Parallel Processing Hesham El-Rewini & Mostafa Abd-El-Barr
49. 4.5 Directory Based Protocols
• Scalable Coherent Interface (SCI)
– The head of the list has the authority to purge other
entries in the list to obtain an exclusive (read-write)
entry.
Advanced Computer Architecture and Parallel Processing Hesham El-Rewini & Mostafa Abd-El-Barr
51. 4.5 Directory Based Protocols
• Scalable Distributed Directory (SDD)
– A singly linked list of distributed directories.
– Similar to the SCI protocol, memory points to the
head of the sharing list.
– Each processor points only to its predecessor.
– The sharing list additions and removals are handled
different from the SCI protocol .
Advanced Computer Architecture and Parallel Processing Hesham El-Rewini & Mostafa Abd-El-Barr
52. 4.5 Directory Based Protocols
• Scalable Distributed Directory (SDD)
– On a read miss, a new requester sends a read-miss
message to memory.
– The memory updates its head pointers to point to the
requester and send a read-miss-forward signal to the
old head.
– On receiving the request, the old head returns the
requested data along with its address as a read-miss-
reply.
– When the reply is received, at the requester’s cache,
the data is copied and the pointer is made to point to
the old head.
Advanced Computer Architecture and Parallel Processing Hesham El-Rewini & Mostafa Abd-El-Barr
54. 4.5 Directory Based Protocols
• Scalable Distributed Directory (SDD)
– On a write miss, a requester sends a write-
miss message to memory.
– The memory updates its head pointers to
point to the requester and sends a write-miss-
forward signal to the old head.
– The old head invalidates itself, returns the
requested data as a write-miss-reply-data
signal, and send a write-miss-forward to the
next cache in the list.
Advanced Computer Architecture and Parallel Processing Hesham El-Rewini & Mostafa Abd-El-Barr
55. 4.5 Directory Based Protocols
• Scalable Distributed Directory (SDD)
– When the next cache receives the write-miss-forward
signal, it invalidates itself and sends a write-miss-
forward to the next cache in the list.
– When the write-miss-forward signal is received by the
tail or by a cache that no longer has copy of the
block, a write-miss-reply is sent to the requester.
– The write is complete when the requester receives
both write-miss-reply-data and write-miss-reply.
Advanced Computer Architecture and Parallel Processing Hesham El-Rewini & Mostafa Abd-El-Barr
56. 4.5 Directory Based Protocols
• Scalable Distributed Directory (SDD)- Write
Miss List Removal
Before After
Memory
Memory
1) write
2) write miss-forward
Cache C0 Cache C2 Cache C3
3) write miss-reply-data
(tail) (head) (invalid)
Cache C3
(exclusive)
3) write miss-forward
4) write miss-reply
Advanced Computer Architecture and Parallel Processing Hesham El-Rewini & Mostafa Abd-El-Barr
58. 4.6 Shared Memory Programming
• Communication – Serial versus Parallel
Code
Code
Private Data
Data
Shared Data
Stack Private Stack
Advanced Computer Architecture and Parallel Processing Hesham El-Rewini & Mostafa Abd-El-Barr
59. 4.6 Shared Memory Programming
• Communication – Via shared data
Code Access Access Code
Private Data Private Data
Shared Data
Private Stack Private Stack
Process 1 Process 2
Advanced Computer Architecture and Parallel Processing Hesham El-Rewini & Mostafa Abd-El-Barr
62. 4.7 Summary
• A shared memory system is made of multiple processors
and memory modules that are connected via some
interconnection network.
• Shared memory multiprocessors are usually bus-based
or switch-based.
• Communication among processors is achieved by writing
to and reading from memory.
• Synchronization among processors is achieved using
locks and barriers.
• Performance of a shared memory system becomes an
issue when the interconnection network connecting the
processors to global memory becomes a bottleneck.
Advanced Computer Architecture and Parallel Processing Hesham El-Rewini & Mostafa Abd-El-Barr
63. 4.7 Summary
• Local caches are used to alleviate the bottleneck
problem.
• But the introduction of caches created a consistency
problem among caches and between memory and
caches.
• Cache coherence schemes can be categorized into 2
main categories:
– Snooping protocols.
– Directory-based protocols.
• The programming model in shared memory systems has
proven to be easier to program compared to message
passing systems.
Advanced Computer Architecture and Parallel Processing Hesham El-Rewini & Mostafa Abd-El-Barr