SlideShare a Scribd company logo
1 of 13
1
Uniform memory access (UMA)
Each processor has uniform access time
to memory - also known as
symmetric multiprocessors (SMPs)
(example: SUN ES1000)
Non-uniform memory access (NUMA)
Time for memory access depends on
location of data; also known as Distributed
Shared memory machines. Local access is faster
than non-local access. Easier to scale
than SMPs (example: SGI Origin 2000)
P P P P
BUS
Memory
Shared Memory: UMA and NUMA
P P P P
BUS
Memory
Network
P P P P
BUS
Memory
2
SMPs: Memory Access Problems
• Conventional wisdom is that systems do not
scale well
– Bus based systems can become saturated
– Fast large crossbars are expensive
3
SMPs: Memory Access Problems
• Cache Coherence : When a processor modifies a shared
variable in local cache, different processors have different value
of the variable. Several mechanism have been developed for
handling the cache coherence problem
• Cache coherence problem
– Copies of a variable can be present in multiple caches
– A write by one processor may not become visible to others
– They'll keep accessing stale value in their caches
– Need to take actions to ensure visibility or cache coherence
4
I/O devices
Memory
P1
$ $ $
P2 P3
1
2
34 5
u = ?u = ?
u:5
u:5
u:5
u = 7
Cache Coherence Problem
• Processors see different values for u after event 3
• Processes accessing main memory may see very stale
value
• Unacceptable to programs, and frequent!
5
Snooping-Based Coherence
• Basic idea:
– Transactions on memory are visible to all processors
– Processor or their representatives can snoop (monitor)
bus and take action on relevant events
• Implementation
– When a processor writes a value a signal is sent over the
bus
– Signal is either
• Write invalidate - tell others cached value is invalid
and then update
• Write broadcast/update - tell others the new value
continuously as it is updated
6
SMP Machines
• Various Suns such as the E10000/HPC10000
• Various Intel and Alpha servers
• Crays: T90, J90, SV1
7
“Distributed-shared” Memory (cc-NUMA)
• Consists of N processors and a global address
space
– All processors can see all memory
– Each processor has some amount of local memory
– Access to the memory of other processors is slower
– Cache coherence (‘cc’) must be maintained across the
network as well as within nodes
• NonUniform Memory Access
P P P P
BUS
Memory
Network
P P P P
BUS
Memory
8
cc-NUMA Memory Issues
• Easier to build with same number of processors
because of slower access to remote memory
(crossbars/buses can be less expensive than if all
processors are in an SMP box)
• Possible to created much larger shared memory
system than in SMP
• Similar cache problems as SMPs, but coherence
must be also enforced across network now!
• Code writers should be aware of data distribution
– Load balance
– Minimize access of "far" memory
9
cc-NUMA Machines
• SGI Origin 2000
• HP X2000, V2500
Distributed Memory
• Each of N processors has its own memory
• Memory is not shared
• Communication occurs using messages
– Communication networks/interconnects
• Custom
– Many manufacturers offer custom interconnects
• Off the shelf
– Ethernet
– ATM
– HIPI
– FIBER Channel
– FDDI
M
P
M
P
M
P
M
P
M
P
M
P
Network
Types of Distributed Memory Machine
Interconnects
• Fully connected
• Array and torus
– Paragon
– T3E
• Crossbar
– IBM SP
– Sun HPC10000
• Hypercube
– Ncube
• Fat Trees
– Meiko CS-2
Multi-tiered/Hybrid Machines
• Clustered SMP nodes (‘CLUMPS’): SMPs or even
cc-NUMAs connected by an interconnect network
• Examples systems
– All new IBM SP
– Sun HPC10000s using multiple nodes
– SGI Origin 2000s when linked together
– HP Exemplar using multiple nodes
– SGI SV1 using multiple nodes
Reasons for Each System
• SMPs (Shared memory machine): easy to build, easy to
program, good price-performance for small numbers of
processors; predictable performance due to UMA
• cc-NUMAs (Distributed Shared memory machines) :
enables larger number of processors and shared memory
address space than SMPs while still being easy to program,
but harder and more expensive to build
• Distributed memory MPPs and clusters: easy to build and
to scale to large numbers or processors, but hard to
program and to achieve good performance
• Multi-tiered/hybrid/CLUMPS: combines bests (worsts?) of
all worlds… but maximum scalability!

More Related Content

What's hot

Computer architecture
Computer architecture Computer architecture
Computer architecture Ashish Kumar
 
Shared-Memory Multiprocessors
Shared-Memory MultiprocessorsShared-Memory Multiprocessors
Shared-Memory MultiprocessorsSalvatore La Bua
 
Lecture 6
Lecture  6Lecture  6
Lecture 6Mr SMAK
 
Lecture 6
Lecture  6Lecture  6
Lecture 6Mr SMAK
 
Hardware multithreading
Hardware multithreadingHardware multithreading
Hardware multithreadingFraboni Ec
 
High Performance Computer Architecture
High Performance Computer ArchitectureHigh Performance Computer Architecture
High Performance Computer ArchitectureSubhasis Dash
 
Linux memory consumption
Linux memory consumptionLinux memory consumption
Linux memory consumptionhaish
 
Coherence and consistency models in multiprocessor architecture
Coherence and consistency models in multiprocessor architectureCoherence and consistency models in multiprocessor architecture
Coherence and consistency models in multiprocessor architectureUniversity of Pisa
 
Lecture 6.1
Lecture  6.1Lecture  6.1
Lecture 6.1Mr SMAK
 
Parallel computing in india
Parallel computing in indiaParallel computing in india
Parallel computing in indiaPreeti Chauhan
 
Multithreaded processors ppt
Multithreaded processors pptMultithreaded processors ppt
Multithreaded processors pptSiddhartha Anand
 
Multithreading computer architecture
 Multithreading computer architecture  Multithreading computer architecture
Multithreading computer architecture Haris456
 
Multiple processor (ppt 2010)
Multiple processor (ppt 2010)Multiple processor (ppt 2010)
Multiple processor (ppt 2010)Arth Ramada
 
Graphics processing uni computer archiecture
Graphics processing uni computer archiectureGraphics processing uni computer archiecture
Graphics processing uni computer archiectureHaris456
 

What's hot (17)

Computer architecture
Computer architecture Computer architecture
Computer architecture
 
Shared-Memory Multiprocessors
Shared-Memory MultiprocessorsShared-Memory Multiprocessors
Shared-Memory Multiprocessors
 
Lecture 6
Lecture  6Lecture  6
Lecture 6
 
Lecture 6
Lecture  6Lecture  6
Lecture 6
 
Hardware multithreading
Hardware multithreadingHardware multithreading
Hardware multithreading
 
High Performance Computer Architecture
High Performance Computer ArchitectureHigh Performance Computer Architecture
High Performance Computer Architecture
 
Linux memory consumption
Linux memory consumptionLinux memory consumption
Linux memory consumption
 
Coherence and consistency models in multiprocessor architecture
Coherence and consistency models in multiprocessor architectureCoherence and consistency models in multiprocessor architecture
Coherence and consistency models in multiprocessor architecture
 
Lecture 6.1
Lecture  6.1Lecture  6.1
Lecture 6.1
 
Parallel computing in india
Parallel computing in indiaParallel computing in india
Parallel computing in india
 
Lecture5
Lecture5Lecture5
Lecture5
 
Multithreaded processors ppt
Multithreaded processors pptMultithreaded processors ppt
Multithreaded processors ppt
 
Multithreading computer architecture
 Multithreading computer architecture  Multithreading computer architecture
Multithreading computer architecture
 
Multiple processor (ppt 2010)
Multiple processor (ppt 2010)Multiple processor (ppt 2010)
Multiple processor (ppt 2010)
 
Graphics processing uni computer archiecture
Graphics processing uni computer archiectureGraphics processing uni computer archiecture
Graphics processing uni computer archiecture
 
Lecture1
Lecture1Lecture1
Lecture1
 
Lecture4
Lecture4Lecture4
Lecture4
 

Viewers also liked

Paralle programming 2
Paralle programming 2Paralle programming 2
Paralle programming 2Anshul Sharma
 
parallelization strategy
parallelization strategyparallelization strategy
parallelization strategyR. M.
 
AMD Heterogeneous Uniform Memory Access
AMD Heterogeneous Uniform Memory AccessAMD Heterogeneous Uniform Memory Access
AMD Heterogeneous Uniform Memory AccessAMD
 
message passing vs shared memory
message passing vs shared memorymessage passing vs shared memory
message passing vs shared memoryHamza Zahid
 

Viewers also liked (6)

Intel tools to optimize HPC systems
Intel tools to optimize HPC systemsIntel tools to optimize HPC systems
Intel tools to optimize HPC systems
 
Paralle programming 2
Paralle programming 2Paralle programming 2
Paralle programming 2
 
parallelization strategy
parallelization strategyparallelization strategy
parallelization strategy
 
AMD Heterogeneous Uniform Memory Access
AMD Heterogeneous Uniform Memory AccessAMD Heterogeneous Uniform Memory Access
AMD Heterogeneous Uniform Memory Access
 
Biosimilars
BiosimilarsBiosimilars
Biosimilars
 
message passing vs shared memory
message passing vs shared memorymessage passing vs shared memory
message passing vs shared memory
 

Similar to Lecture4

Multiprocessor.pptx
 Multiprocessor.pptx Multiprocessor.pptx
Multiprocessor.pptxMuhammad54342
 
Multiprocessor_YChen.ppt
Multiprocessor_YChen.pptMultiprocessor_YChen.ppt
Multiprocessor_YChen.pptAberaZeleke1
 
18 parallel processing
18 parallel processing18 parallel processing
18 parallel processingdilip kumar
 
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).pptssuser5c9d4b1
 
Intro (Distributed computing)
Intro (Distributed computing)Intro (Distributed computing)
Intro (Distributed computing)Sri Prasanna
 
Introduction 1
Introduction 1Introduction 1
Introduction 1Yasir Khan
 
multiprocessors and multicomputers
 multiprocessors and multicomputers multiprocessors and multicomputers
multiprocessors and multicomputersPankaj Kumar Jain
 
Overview of HPC.pptx
Overview of HPC.pptxOverview of HPC.pptx
Overview of HPC.pptxsundariprabhu
 
ADVANCED COMPUTER ARCHITECTURE AND PARALLEL PROCESSING
ADVANCED COMPUTER ARCHITECTUREAND PARALLEL PROCESSINGADVANCED COMPUTER ARCHITECTUREAND PARALLEL PROCESSING
ADVANCED COMPUTER ARCHITECTURE AND PARALLEL PROCESSING Zena Abo-Altaheen
 
Multicore and shared multi processor
Multicore and shared multi processorMulticore and shared multi processor
Multicore and shared multi processorSou Jana
 

Similar to Lecture4 (20)

module4.ppt
module4.pptmodule4.ppt
module4.ppt
 
Multiprocessor.pptx
 Multiprocessor.pptx Multiprocessor.pptx
Multiprocessor.pptx
 
Overview on NUMA
Overview on NUMAOverview on NUMA
Overview on NUMA
 
Multiprocessor_YChen.ppt
Multiprocessor_YChen.pptMultiprocessor_YChen.ppt
Multiprocessor_YChen.ppt
 
10. compute-part-2
10. compute-part-210. compute-part-2
10. compute-part-2
 
Lect18
Lect18Lect18
Lect18
 
parallel-processing.ppt
parallel-processing.pptparallel-processing.ppt
parallel-processing.ppt
 
10. compute-part-1
10. compute-part-110. compute-part-1
10. compute-part-1
 
18 parallel processing
18 parallel processing18 parallel processing
18 parallel processing
 
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
 
Week5
Week5Week5
Week5
 
Hpc 4 5
Hpc 4 5Hpc 4 5
Hpc 4 5
 
Parallel processing extra
Parallel processing extraParallel processing extra
Parallel processing extra
 
Intro (Distributed computing)
Intro (Distributed computing)Intro (Distributed computing)
Intro (Distributed computing)
 
Introduction 1
Introduction 1Introduction 1
Introduction 1
 
multiprocessors and multicomputers
 multiprocessors and multicomputers multiprocessors and multicomputers
multiprocessors and multicomputers
 
Overview of HPC.pptx
Overview of HPC.pptxOverview of HPC.pptx
Overview of HPC.pptx
 
22CS201 COA
22CS201 COA22CS201 COA
22CS201 COA
 
ADVANCED COMPUTER ARCHITECTURE AND PARALLEL PROCESSING
ADVANCED COMPUTER ARCHITECTUREAND PARALLEL PROCESSINGADVANCED COMPUTER ARCHITECTUREAND PARALLEL PROCESSING
ADVANCED COMPUTER ARCHITECTURE AND PARALLEL PROCESSING
 
Multicore and shared multi processor
Multicore and shared multi processorMulticore and shared multi processor
Multicore and shared multi processor
 

More from tt_aljobory (20)

Homework 2 sol
Homework 2 solHomework 2 sol
Homework 2 sol
 
Lecture12
Lecture12Lecture12
Lecture12
 
Lecture11
Lecture11Lecture11
Lecture11
 
Lecture10
Lecture10Lecture10
Lecture10
 
Lecture9
Lecture9Lecture9
Lecture9
 
Lecture7
Lecture7Lecture7
Lecture7
 
Lecture8
Lecture8Lecture8
Lecture8
 
Lecture6
Lecture6Lecture6
Lecture6
 
Lecture5
Lecture5Lecture5
Lecture5
 
Lecture3
Lecture3Lecture3
Lecture3
 
Lecture2
Lecture2Lecture2
Lecture2
 
Lecture1
Lecture1Lecture1
Lecture1
 
Lecture 1
Lecture 1Lecture 1
Lecture 1
 
Good example on ga
Good example on gaGood example on ga
Good example on ga
 
Lect3
Lect3Lect3
Lect3
 
Lect4
Lect4Lect4
Lect4
 
Lect4
Lect4Lect4
Lect4
 
Above theclouds
Above thecloudsAbove theclouds
Above theclouds
 
Inet prog
Inet progInet prog
Inet prog
 
Form
FormForm
Form
 

Lecture4

  • 1. 1 Uniform memory access (UMA) Each processor has uniform access time to memory - also known as symmetric multiprocessors (SMPs) (example: SUN ES1000) Non-uniform memory access (NUMA) Time for memory access depends on location of data; also known as Distributed Shared memory machines. Local access is faster than non-local access. Easier to scale than SMPs (example: SGI Origin 2000) P P P P BUS Memory Shared Memory: UMA and NUMA P P P P BUS Memory Network P P P P BUS Memory
  • 2. 2 SMPs: Memory Access Problems • Conventional wisdom is that systems do not scale well – Bus based systems can become saturated – Fast large crossbars are expensive
  • 3. 3 SMPs: Memory Access Problems • Cache Coherence : When a processor modifies a shared variable in local cache, different processors have different value of the variable. Several mechanism have been developed for handling the cache coherence problem • Cache coherence problem – Copies of a variable can be present in multiple caches – A write by one processor may not become visible to others – They'll keep accessing stale value in their caches – Need to take actions to ensure visibility or cache coherence
  • 4. 4 I/O devices Memory P1 $ $ $ P2 P3 1 2 34 5 u = ?u = ? u:5 u:5 u:5 u = 7 Cache Coherence Problem • Processors see different values for u after event 3 • Processes accessing main memory may see very stale value • Unacceptable to programs, and frequent!
  • 5. 5 Snooping-Based Coherence • Basic idea: – Transactions on memory are visible to all processors – Processor or their representatives can snoop (monitor) bus and take action on relevant events • Implementation – When a processor writes a value a signal is sent over the bus – Signal is either • Write invalidate - tell others cached value is invalid and then update • Write broadcast/update - tell others the new value continuously as it is updated
  • 6. 6 SMP Machines • Various Suns such as the E10000/HPC10000 • Various Intel and Alpha servers • Crays: T90, J90, SV1
  • 7. 7 “Distributed-shared” Memory (cc-NUMA) • Consists of N processors and a global address space – All processors can see all memory – Each processor has some amount of local memory – Access to the memory of other processors is slower – Cache coherence (‘cc’) must be maintained across the network as well as within nodes • NonUniform Memory Access P P P P BUS Memory Network P P P P BUS Memory
  • 8. 8 cc-NUMA Memory Issues • Easier to build with same number of processors because of slower access to remote memory (crossbars/buses can be less expensive than if all processors are in an SMP box) • Possible to created much larger shared memory system than in SMP • Similar cache problems as SMPs, but coherence must be also enforced across network now! • Code writers should be aware of data distribution – Load balance – Minimize access of "far" memory
  • 9. 9 cc-NUMA Machines • SGI Origin 2000 • HP X2000, V2500
  • 10. Distributed Memory • Each of N processors has its own memory • Memory is not shared • Communication occurs using messages – Communication networks/interconnects • Custom – Many manufacturers offer custom interconnects • Off the shelf – Ethernet – ATM – HIPI – FIBER Channel – FDDI M P M P M P M P M P M P Network
  • 11. Types of Distributed Memory Machine Interconnects • Fully connected • Array and torus – Paragon – T3E • Crossbar – IBM SP – Sun HPC10000 • Hypercube – Ncube • Fat Trees – Meiko CS-2
  • 12. Multi-tiered/Hybrid Machines • Clustered SMP nodes (‘CLUMPS’): SMPs or even cc-NUMAs connected by an interconnect network • Examples systems – All new IBM SP – Sun HPC10000s using multiple nodes – SGI Origin 2000s when linked together – HP Exemplar using multiple nodes – SGI SV1 using multiple nodes
  • 13. Reasons for Each System • SMPs (Shared memory machine): easy to build, easy to program, good price-performance for small numbers of processors; predictable performance due to UMA • cc-NUMAs (Distributed Shared memory machines) : enables larger number of processors and shared memory address space than SMPs while still being easy to program, but harder and more expensive to build • Distributed memory MPPs and clusters: easy to build and to scale to large numbers or processors, but hard to program and to achieve good performance • Multi-tiered/hybrid/CLUMPS: combines bests (worsts?) of all worlds… but maximum scalability!