Scuola Politecnica
Dipartimento di Ingegneria Chimica,
Gestionale, Informatica, Meccanica
Parallel Computer Architectures
Shared-Memory Multiprocessors
Architetture Avanzate dei Calcolatori
Salvatore La Bua
S. La Bua - DICGIM/UniPA Architetture Avanzate dei Calcolatori
2
Shared Memory Multiprocessors
Multiprocessors Multicomputers
S. La Bua - DICGIM/UniPA Architetture Avanzate dei Calcolatori
3
Taxonomy of Parallel Computers
● SISD
– Single Instruction, Single Data
● Von Neumann architecture
● SIMD
– Single Instruction, Multiple Data
● Vector and Array processor architectures
● MISD
– Multiple Instruction, Single Data
● MIMD
– Multiple Instruction, Multiple Data
● Multiprocessor architectures: UMA, NUMA, COMA
● Multicomputer architectures: MPP, COW
S. La Bua - DICGIM/UniPA Architetture Avanzate dei Calcolatori
4
Memory Semantics:
Consistency models
● Strict consistency
– Any read to a location x always returns the value of the most recent
write to x
● Sequential consistency
– For multiple read and write requests, some interleaving is chosen
– All CPUs see the same order
● Processor consistency
– Writes by any CPU are seen by all in the order they were issued
– For every memory word, all CPUs see writes to it in the same order
● Weak consistency
– Does not guarantee that writes from a single CPU are seen in order
● Release consistency
– An improvement to the weak consistency model
S. La Bua - DICGIM/UniPA Architetture Avanzate dei Calcolatori
5
UMA Symmetric
Multiprocessor Architectures
● Uniform Memory Access
● Snooping Caches
● Coherence protocols
– Write-through
– Write-allocate
– Write-back
S. La Bua - DICGIM/UniPA Architetture Avanzate dei Calcolatori
6
MESI Cache Coherence
Protocol
● MESI: Modified-Exclusive-Shared-Invalid
– A write-back protocol
● Four statuses each cache entry can be in:
– Invalid
● The cache entry does not contain valid data
– Shared
● Multiple caches may hold the line
● Memory is up to date
– Exclusive
● No other cache holds the line
● Memory is up to date
– Modified
● The entry is valid
● Memory is invalid
● No copies exist
S. La Bua - DICGIM/UniPA Architetture Avanzate dei Calcolatori
7
UMA Multiprocessors
Using Crossbar
Switches
Using Multistage
Switching Networks
S. La Bua - DICGIM/UniPA Architetture Avanzate dei Calcolatori
8
NUMA Multiprocessors
●
Non-Uniform Memory Access
– Single address space visible to all CPUs
– Access to remote memory done using LOAD and STORE instructions
– Access to remote memory is slower than access to local memory
●
NC-NUMA
– Non Cache coherent NUMA
● CC-NUMA
– Cache Coherent NUMA
S. La Bua - DICGIM/UniPA Architetture Avanzate dei Calcolatori
9
Sun Fire E25K NUMA
Multiprocessor
● An example of a shared-memory
NUMA multiprocessor
S. La Bua - DICGIM/UniPA Architetture Avanzate dei Calcolatori
10
COMA Multiprocessors
● Cache Only Memory Access
– Use each CPU’s main memory as a cache
– Physical address space split into cache lines
● Problems:
– How are cache lines located?
● Main memory or actual cache
– When a line is purged, what happens if it is the last copy?
● Last copy cannot be thrown out

Shared-Memory Multiprocessors

  • 1.
    Scuola Politecnica Dipartimento diIngegneria Chimica, Gestionale, Informatica, Meccanica Parallel Computer Architectures Shared-Memory Multiprocessors Architetture Avanzate dei Calcolatori Salvatore La Bua
  • 2.
    S. La Bua- DICGIM/UniPA Architetture Avanzate dei Calcolatori 2 Shared Memory Multiprocessors Multiprocessors Multicomputers
  • 3.
    S. La Bua- DICGIM/UniPA Architetture Avanzate dei Calcolatori 3 Taxonomy of Parallel Computers ● SISD – Single Instruction, Single Data ● Von Neumann architecture ● SIMD – Single Instruction, Multiple Data ● Vector and Array processor architectures ● MISD – Multiple Instruction, Single Data ● MIMD – Multiple Instruction, Multiple Data ● Multiprocessor architectures: UMA, NUMA, COMA ● Multicomputer architectures: MPP, COW
  • 4.
    S. La Bua- DICGIM/UniPA Architetture Avanzate dei Calcolatori 4 Memory Semantics: Consistency models ● Strict consistency – Any read to a location x always returns the value of the most recent write to x ● Sequential consistency – For multiple read and write requests, some interleaving is chosen – All CPUs see the same order ● Processor consistency – Writes by any CPU are seen by all in the order they were issued – For every memory word, all CPUs see writes to it in the same order ● Weak consistency – Does not guarantee that writes from a single CPU are seen in order ● Release consistency – An improvement to the weak consistency model
  • 5.
    S. La Bua- DICGIM/UniPA Architetture Avanzate dei Calcolatori 5 UMA Symmetric Multiprocessor Architectures ● Uniform Memory Access ● Snooping Caches ● Coherence protocols – Write-through – Write-allocate – Write-back
  • 6.
    S. La Bua- DICGIM/UniPA Architetture Avanzate dei Calcolatori 6 MESI Cache Coherence Protocol ● MESI: Modified-Exclusive-Shared-Invalid – A write-back protocol ● Four statuses each cache entry can be in: – Invalid ● The cache entry does not contain valid data – Shared ● Multiple caches may hold the line ● Memory is up to date – Exclusive ● No other cache holds the line ● Memory is up to date – Modified ● The entry is valid ● Memory is invalid ● No copies exist
  • 7.
    S. La Bua- DICGIM/UniPA Architetture Avanzate dei Calcolatori 7 UMA Multiprocessors Using Crossbar Switches Using Multistage Switching Networks
  • 8.
    S. La Bua- DICGIM/UniPA Architetture Avanzate dei Calcolatori 8 NUMA Multiprocessors ● Non-Uniform Memory Access – Single address space visible to all CPUs – Access to remote memory done using LOAD and STORE instructions – Access to remote memory is slower than access to local memory ● NC-NUMA – Non Cache coherent NUMA ● CC-NUMA – Cache Coherent NUMA
  • 9.
    S. La Bua- DICGIM/UniPA Architetture Avanzate dei Calcolatori 9 Sun Fire E25K NUMA Multiprocessor ● An example of a shared-memory NUMA multiprocessor
  • 10.
    S. La Bua- DICGIM/UniPA Architetture Avanzate dei Calcolatori 10 COMA Multiprocessors ● Cache Only Memory Access – Use each CPU’s main memory as a cache – Physical address space split into cache lines ● Problems: – How are cache lines located? ● Main memory or actual cache – When a line is purged, what happens if it is the last copy? ● Last copy cannot be thrown out