“ SHARED MEMORY” MADE BY: SANJANA BAKSHI 7IT087 A PPT ON
TOPICS TO BE COVERED: DSM system Shared memory On chip memory Bus based multiprocessor Working through cache Write through cache Write once protocol Ring based multiprocessor Protocol used Similarities and differences b\w ring based and bus based
What is a DSM system? A distributed-memory system (often called a multicomputer) consist of collection of workstations connected by a LAN share a single paged,virtual address space Each page is present on exactly one maachine An attempt to reference a page on different machine causes a hardware page fault which traps to operating system The OS den sends a message to the remote machinewhich finds the needed page and sends it back to the req. processor
What is shared memory? Shared memory is the memory that is simultaneously accessed by more than one CPU OR PROCCESSOR There are local caches for each processor It is cheaper to cache than main memory It is simple to program and hard to scale
Various architectures to be discussed: On chip memory Bus based multiprocessors Ring based multiprocessors
On Chip Memory In this CPU portion of the chip has a address and data lines that directly connect to the memory portion Such chips are used in cars,appliances and even toys In hypothetical shared memory multiprocessor we have multiple CPU’S directly sharing the same memory but it would be complicated n expensive
On-Chip Memory   CPU Memory CPU1 Memory CPU4 CPU2 CPU3 Chip package Address and data lines Connecting the CPU to the  memory extension A single-chip computer A hypothetical shared-memory Multiprocessor.
What is a bus??? BUS is a collection of parallel wires,some holding the address the CPU wants to read or write,some for sending or receiving data and the rest for controlling the transfers. In most systems buses are external and are used to connect CPU’S,MEMORIES AND I/O CONTROLLERS
Bus-based multiprocessors Bus-based multiprocessors BUS BASED MULTIPROCESSORS SMP :   Symmetric Multi-Processing All CPUs connected to one bus (backplane) Memory and peripherals are accessed via shared bus. System looks the same from any processor. Bus CPU A CPU B memory Device I/O
Bus-based multiprocessors Dealing with bus overload  - add local memory CPU does I/O to cache memory - access main memory on cache miss Bus memory Device I/O CPU A cache CPU B cache
Working with a cache CPU A reads location 12345 from memory Bus 12345:7 Device I/O CPU A 12345: 7 CPU B
Working with a cache CPU B reads location 12345 from memory Gets old value Memory not coherent! Bus 12345:7 Device I/O CPU A 12345: 3 CPU B 12345: 7
Write-through cache …  continued CPU B reads location 12345 from memory - loads into cache Bus 12345:3 Device I/O CPU A 12345: 3 CPU B 12345: 3
Write-through cache CPU A modifies location 12345 - write-through 12345:3 12345: 3 Cache on CPU B not updated Memory not coherent! Bus Device I/O CPU A CPU B 12345: 3 12345:0 12345: 0
Write once protocol   This protocol manages cache blocks, each of which can be in one of the following three states: INVALID: This cache block does not contain valid data. CLEAN: Memory is up-to-date; the block may be in other caches. DIRTY: Memory is incorrect; no other cache holds the block. The basic idea of the protocol is that a word that is being read by multiple CPUs is allowed to be present in all their caches. A word that is being heavily written by only one machine is kept in its cache and not written back to memory on every write to reduce bus traffic.
Write through protocol   Event   Action taken by a cache in   response to its own CPU’s operation   Action taken by a cache in response to a remote CPU’s operation  Read mis s Fetch data from memory and store in cache   no action   Read hit  Fetch data from local cache  no action   Write miss  Update data in memory and store in cache  no action   Write hit   Update memory and cache   invalidate cache entry
For example  A B W 1 C W 1 CLEAN Memory is correct Initial state – word W 1  containing  value W1 is in memory and is also  cached by B. CPU A B W 1 C W 1 W 1 CLEAN CLEAN Memory is correct (b) A reades word W and gets W 1 . B does not respond to the read, but the memory  does.
A B W 1 C W 2 W 1 A B W 1 C W 3 W 1 DIRTY INVALID DIRTY INVALID Memory is correct (c)A write a value W2, B snoops on the bus, sees the write, and invalidates its entry. A’s copy is marked DIRTY. Not update memory Memory is correct (d) A write W again. This and subsequent writes by A are done locally, without any bus traffic.
A B W 1 C W 3 W 1 INVALID INVALID DIRTY W 3 (e) C reads or writes W. A sees the request by snooping on the bus, provides the value, and invalidates its own entry. C now has the only valid copy. Not update memory
Ring-Based Multiprocessors : Memnet CPU CPU CPU CPU CPU CPU CPU Private memory MMU Cache Home memory Memory management unit Location Interrupt Home Exclusive Valid 0 1 2 3 The block table
Protocol Read When the CPU wants to read a word from shared memory, the memory address to be read is passed to the Memnet device, which checks the block table to see if the block is present. If so, the request is satisfied. If not, the Memnet device waits until it captures the circulating token, puts a request onto the ring. As the packet passes around the ring, each Memnet device along the way checks to see if it has the block needed. If so, it puts the block in the dummy field and modifies the packet header to inhibit subsequent machines from doing so.  If the requesting machine has no free space in its cache to hold the incoming block, to make space, it picks a cached block at random and sends it home. Blocks whose  Home  bit are set are never chosen because they are already home.
Write If the block containing the word to be written is present and is the only copy in the system (i.e., the  Exclusive  bit is set), the word is just written locally . If the needed block is present but it is not the only copy, an invalidation packet is first sent around the ring to force all other machines to discard their copies of the block about to be written. When the invalidation packet arrives back at the sender, the  Exclusive  bit is set for that block and the write proceeds locally . If the block is not present, a packet is sent out that combines a read request and an invalidation request. The first machine that has the block copies it into the packet and discards its own copy. All subsequent machines just discard the block from their caches. When the packet comes back to the sender, it is stored there and written .
Similarities b\w bus based and ring based multiprocessors In both cases read operations always return the values most recently written In both designs a block may be absent from a cache,present in multiple caches for reading,or present in a single cache for writing
DIFFERENCES B\W TWO MULTIPROCESSORS BUS BASED MULTIPROCCESORS They are tightly coupled with the CPU’S normally in a single rack It has seprate global memory RING BASED MULTIPROCCESORS Machines here can be much more loosely coupled n this loose coupling can affect their performance It has no seprate global memory
The end.

Dos final ppt

  • 1.
    “ SHARED MEMORY”MADE BY: SANJANA BAKSHI 7IT087 A PPT ON
  • 2.
    TOPICS TO BECOVERED: DSM system Shared memory On chip memory Bus based multiprocessor Working through cache Write through cache Write once protocol Ring based multiprocessor Protocol used Similarities and differences b\w ring based and bus based
  • 3.
    What is aDSM system? A distributed-memory system (often called a multicomputer) consist of collection of workstations connected by a LAN share a single paged,virtual address space Each page is present on exactly one maachine An attempt to reference a page on different machine causes a hardware page fault which traps to operating system The OS den sends a message to the remote machinewhich finds the needed page and sends it back to the req. processor
  • 4.
    What is sharedmemory? Shared memory is the memory that is simultaneously accessed by more than one CPU OR PROCCESSOR There are local caches for each processor It is cheaper to cache than main memory It is simple to program and hard to scale
  • 5.
    Various architectures tobe discussed: On chip memory Bus based multiprocessors Ring based multiprocessors
  • 6.
    On Chip MemoryIn this CPU portion of the chip has a address and data lines that directly connect to the memory portion Such chips are used in cars,appliances and even toys In hypothetical shared memory multiprocessor we have multiple CPU’S directly sharing the same memory but it would be complicated n expensive
  • 7.
    On-Chip Memory CPU Memory CPU1 Memory CPU4 CPU2 CPU3 Chip package Address and data lines Connecting the CPU to the memory extension A single-chip computer A hypothetical shared-memory Multiprocessor.
  • 8.
    What is abus??? BUS is a collection of parallel wires,some holding the address the CPU wants to read or write,some for sending or receiving data and the rest for controlling the transfers. In most systems buses are external and are used to connect CPU’S,MEMORIES AND I/O CONTROLLERS
  • 9.
    Bus-based multiprocessors Bus-basedmultiprocessors BUS BASED MULTIPROCESSORS SMP : Symmetric Multi-Processing All CPUs connected to one bus (backplane) Memory and peripherals are accessed via shared bus. System looks the same from any processor. Bus CPU A CPU B memory Device I/O
  • 10.
    Bus-based multiprocessors Dealingwith bus overload - add local memory CPU does I/O to cache memory - access main memory on cache miss Bus memory Device I/O CPU A cache CPU B cache
  • 11.
    Working with acache CPU A reads location 12345 from memory Bus 12345:7 Device I/O CPU A 12345: 7 CPU B
  • 12.
    Working with acache CPU B reads location 12345 from memory Gets old value Memory not coherent! Bus 12345:7 Device I/O CPU A 12345: 3 CPU B 12345: 7
  • 13.
    Write-through cache … continued CPU B reads location 12345 from memory - loads into cache Bus 12345:3 Device I/O CPU A 12345: 3 CPU B 12345: 3
  • 14.
    Write-through cache CPUA modifies location 12345 - write-through 12345:3 12345: 3 Cache on CPU B not updated Memory not coherent! Bus Device I/O CPU A CPU B 12345: 3 12345:0 12345: 0
  • 15.
    Write once protocol This protocol manages cache blocks, each of which can be in one of the following three states: INVALID: This cache block does not contain valid data. CLEAN: Memory is up-to-date; the block may be in other caches. DIRTY: Memory is incorrect; no other cache holds the block. The basic idea of the protocol is that a word that is being read by multiple CPUs is allowed to be present in all their caches. A word that is being heavily written by only one machine is kept in its cache and not written back to memory on every write to reduce bus traffic.
  • 16.
    Write through protocol Event Action taken by a cache in response to its own CPU’s operation Action taken by a cache in response to a remote CPU’s operation Read mis s Fetch data from memory and store in cache no action Read hit Fetch data from local cache no action Write miss Update data in memory and store in cache no action Write hit Update memory and cache invalidate cache entry
  • 17.
    For example A B W 1 C W 1 CLEAN Memory is correct Initial state – word W 1 containing value W1 is in memory and is also cached by B. CPU A B W 1 C W 1 W 1 CLEAN CLEAN Memory is correct (b) A reades word W and gets W 1 . B does not respond to the read, but the memory does.
  • 18.
    A B W1 C W 2 W 1 A B W 1 C W 3 W 1 DIRTY INVALID DIRTY INVALID Memory is correct (c)A write a value W2, B snoops on the bus, sees the write, and invalidates its entry. A’s copy is marked DIRTY. Not update memory Memory is correct (d) A write W again. This and subsequent writes by A are done locally, without any bus traffic.
  • 19.
    A B W1 C W 3 W 1 INVALID INVALID DIRTY W 3 (e) C reads or writes W. A sees the request by snooping on the bus, provides the value, and invalidates its own entry. C now has the only valid copy. Not update memory
  • 20.
    Ring-Based Multiprocessors :Memnet CPU CPU CPU CPU CPU CPU CPU Private memory MMU Cache Home memory Memory management unit Location Interrupt Home Exclusive Valid 0 1 2 3 The block table
  • 21.
    Protocol Read Whenthe CPU wants to read a word from shared memory, the memory address to be read is passed to the Memnet device, which checks the block table to see if the block is present. If so, the request is satisfied. If not, the Memnet device waits until it captures the circulating token, puts a request onto the ring. As the packet passes around the ring, each Memnet device along the way checks to see if it has the block needed. If so, it puts the block in the dummy field and modifies the packet header to inhibit subsequent machines from doing so. If the requesting machine has no free space in its cache to hold the incoming block, to make space, it picks a cached block at random and sends it home. Blocks whose Home bit are set are never chosen because they are already home.
  • 22.
    Write If theblock containing the word to be written is present and is the only copy in the system (i.e., the Exclusive bit is set), the word is just written locally . If the needed block is present but it is not the only copy, an invalidation packet is first sent around the ring to force all other machines to discard their copies of the block about to be written. When the invalidation packet arrives back at the sender, the Exclusive bit is set for that block and the write proceeds locally . If the block is not present, a packet is sent out that combines a read request and an invalidation request. The first machine that has the block copies it into the packet and discards its own copy. All subsequent machines just discard the block from their caches. When the packet comes back to the sender, it is stored there and written .
  • 23.
    Similarities b\w busbased and ring based multiprocessors In both cases read operations always return the values most recently written In both designs a block may be absent from a cache,present in multiple caches for reading,or present in a single cache for writing
  • 24.
    DIFFERENCES B\W TWOMULTIPROCESSORS BUS BASED MULTIPROCCESORS They are tightly coupled with the CPU’S normally in a single rack It has seprate global memory RING BASED MULTIPROCCESORS Machines here can be much more loosely coupled n this loose coupling can affect their performance It has no seprate global memory
  • 25.