SlideShare a Scribd company logo
Multiprocessors
ELEC 6200 Computer Architecture and Design
Instructor: Dr. Agrawal
Yu-Chun Chen
10/27/06
Why Choose a Multiprocessor?
• A single CPU can only go so fast, use more
than one CPU to improve performance
• Multiple users
• Multiple applications
• Multi-tasking within an application
• Responsiveness and/or throughput
• Share hardware between CPUs
Multiprocessor Symmetry
• In a multiprocessing system, all CPUs may be equal, or
some may be reserved for special purposes.
• A combination of hardware and operating-system
software design considerations determine the symmetry.
• Systems that treat all CPUs equally are called symmetric
multiprocessing (SMP) systems.
• If all CPUs are not equal, system resources may be
divided in a number of ways, including asymmetric
multiprocessing (ASMP), non-uniform memory access
(NUMA) multiprocessing, and clustered multiprocessing.
Instruction and Data Streams
Multiprocessors can be used in different ways:
• Uniprossesors (single-instruction, single-data or SISD)
• Within a single system to execute multiple, independent
sequences of instructions in multiple contexts (multiple-
instruction, multiple-data or MIMD);
• A single sequence of instructions in multiple contexts
(single-instruction, multiple-data or SIMD, often used in
vector processing);
• Multiple sequences of instructions in a single context
(multiple-instruction, single-data or MISD, used for
redundancy in fail-safe systems and sometimes applied
to describe pipelined processors or hyper threading).
Processor Coupling
Tightly-coupled multiprocessor systems:
• Contain multiple CPUs that are connected at the bus level.
• These CPUs may have access to a central shared memory
(Symmetric Multiprocessing, or SMP), or may participate in a
memory hierarchy with both local and shared memory (Non-
Uniform Memory Access, or NUMA).
• Example: IBM p690 Regatta, Chip multiprocessors, also
known as multi-core computing.
Loosely-coupled multiprocessor systems:
• Often referred as clusters
• Based on multiple standalone single or dual processor
commodity computers interconnected via a high speed
communication system, such as Gigabit ethernet.
• Example: Linux Beowulf cluster
Multiprocessor Communication
Architectures
Message Passing
• Separate address space for each processor
• Processors communicate via message passing
• Processors have private memories
• Focuses attention on costly non-local operations
Shared Memory
• Processors communicate with shared address space
• Processors communicate by memory read/write
• Easy on small-scale machines
• Lower latency
• SMP or NUMA
• The kind that we will focus on today
Shared-Memory Processors
. . .
interconnection network
. . .
processor
1
cache
processor
2
cache
processor
N
cache
memory
1
memory
M
memory
2
•Single copy of the OS (although some parts might be parallel)
•Relatively easy to program and port sequential code to
•Difficult to scale to large numbers of processors
UMA machine block diagram
Types of Shared-Memory Architectures
UMA
• Uniform Memory Access
• Access to all memory occurred at the same speed for
all processors.
• We will focus on UMA today.
NUMA
• Non-Uniform Memory Access
• a.k.a. “Distributed Shared Memory”.
• Typically interconnection is grid or hypercube.
• Access to some parts of memory is faster for some
processors than other parts of memory.
• Harder to program, but scales to more processors
Bus Based UMA
(a) Simplest MP:
More than one processor on a single bus
connect to memory, bus bandwidth becomes
a bottleneck.
(b) Each processor has a cache to reduce the need
to access to memory.
(c) To further scale the number of processors, each
processor is given private local memory.
NUMA
• All memories can be addressed by all processors, but access to a
processor’s own local memory is faster than access to another
processor’s remote memory.
• Looks like a distributed machine, but the interconnection network is
usually custom-designed switches and/or buses.
OS Option 1
Each CPU has its own OS
• Statically allocate physical memory to each CPU
• Each CPU runs its own independents OS
• Share peripherals
• Each CPU handles its processes system calls
• Used in early multiprocessor systems
• Simple to implement
• Avoids concurrency issues by not sharing
• Issues: 1. Each processor has its own scheduling queue.
2. Each processor has its own memory partition.
3. Consistency is an issue with independent disk buffer caches and
potentially shared files.
OS Option 2
Master-Slave Multiprocessors
• OS mostly runs on a single fixed CPU.
• User-level applications run on the other CPUs.
• All system calls are passed to the Master CPU for processing
• Very little synchronisation required
• Single to implement
• Single centralised scheduler to keep all processors busy
• Memory can be allocated as needed to all CPUs.
• Issues: Master CPU becomes the bottleneck.
OS Option 3
Symmetric Multiprocessors (SMP)
• OS kernel runs on all processors, while load and resources are balanced
between all processors.
• One alternative: A single mutex (mutual exclusion object) that make the
entire kernel a large critical section; Only one CPU can be in the kernel at a
time; Only slight better than master-slave
• Better alternative: Identify independent parts of the kernel and make each of
them their own critical section, which allows parallelism in the kernel
• Issues: A difficult task; Code is mostly similar to uniprocessor code; hard
part is identifying independent parts that don’t interfere with each other
Earlier Example
Quad-Processor Pentium Pro
• SMP, bus interconnection.
• 4 x 200 MHz Intel Pentium Pro processors.
• 8 + 8 Kb L1 cache per processor.
• 512 Kb L2 cache per processor.
• Snoopy cache coherence.
• Employed in Compaq, HP, IBM, NetPower.
• OS: Windows NT, Solaris, Linux, etc.
Example
HP Integrity Superdome
• The HP Integrity Superdome is HP's high-end addition to the family of
industry-standard Itanium®-based solutions. The Superdome offers
several configurations from 2-way multiprocessing all the way to 128
CPUs supporting multiple operating systems such as HP-UX 11iV2,
Microsoft Windows Server 2003 Datacentre Edition, Linux and
OpenVMS.
• Processor: 2 to 64 Intel Itanium 2 processors (1.6 GHz with 9 MB
cache)
• Memory: Up to 1TB DDR memory
• 1-16 cell boards (each cell: 2 or 4 processors and 2 to 32 GB memory)
•
• 48 - 96 PCI-X internal hot-plug I/O card slots (Optional Server
Expansion Unit)
• 4 -16 hardware partitions (nPars) using Server Expansion Unit
Conclusion
• Parallel processing is a future technique for
higher performance and effectiveness for
multiprogrammed workloads.
• MPs combine the difficulties of building complex
hardware systems and complex software
systems.
• Communication, memory, affinity and
throughputs presents an important influence on
the systems costs and performances
• On-chip MPs (MPSoC) technology appears to
be growing
References
• www.intel.com
• www.webopedia.com
• www.wikipedia.org
• www.IBM.com
• www.hp.com
• www.cis.upenn.edu/~milom/cis700-Spring04/
• www.tkt.cs.tut.fi/kurssit/3200/S06/Luennot/Lec_notes06/
• www.cs.berkeley.edu/~pattrsn
• www.cs.caltech.edu/~cs284
• cgi.cse.unsw.edu.au/~cs3231/06s1/lectures

More Related Content

Similar to Multiprocessor_YChen.ppt

Multiprocessor.pptx
 Multiprocessor.pptx Multiprocessor.pptx
Multiprocessor.pptx
Muhammad54342
 
CA UNIT IV.pptx
CA UNIT IV.pptxCA UNIT IV.pptx
CA UNIT IV.pptx
ssuser9dbd7e
 
CSE3120- Module1 part 1 v1.pptx
CSE3120- Module1 part 1 v1.pptxCSE3120- Module1 part 1 v1.pptx
CSE3120- Module1 part 1 v1.pptx
akhilagajjala
 
Operating System Overview.pdf
Operating System Overview.pdfOperating System Overview.pdf
Operating System Overview.pdf
PrashantKhobragade3
 
chapter1.ppt
chapter1.pptchapter1.ppt
chapter1.ppt
UsamaKhan987353
 
Chip Multithreading Systems Need a New Operating System Scheduler
Chip Multithreading Systems Need a New Operating System Scheduler Chip Multithreading Systems Need a New Operating System Scheduler
Chip Multithreading Systems Need a New Operating System Scheduler
Sarwan ali
 
Introduction to multi core
Introduction to multi coreIntroduction to multi core
Introduction to multi coremukul bhardwaj
 
MK Sistem Operasi.pdf
MK Sistem Operasi.pdfMK Sistem Operasi.pdf
MK Sistem Operasi.pdf
wisard1
 
Week 13-14 Parrallel Processing-new.pptx
Week 13-14 Parrallel Processing-new.pptxWeek 13-14 Parrallel Processing-new.pptx
Week 13-14 Parrallel Processing-new.pptx
FaizanSaleem81
 
High performance computing
High performance computingHigh performance computing
High performance computing
punjab engineering college, chandigarh
 
Computer Architecture & Organization.ppt
Computer Architecture & Organization.pptComputer Architecture & Organization.ppt
Computer Architecture & Organization.ppt
FarhanaMariyam1
 
assignment_presentaion_jhvvnvhjhbhjhvjh.pptx
assignment_presentaion_jhvvnvhjhbhjhvjh.pptxassignment_presentaion_jhvvnvhjhbhjhvjh.pptx
assignment_presentaion_jhvvnvhjhbhjhvjh.pptx
23mu36
 
Memory and Cache Coherence in Multiprocessor System.pdf
Memory and Cache Coherence in Multiprocessor System.pdfMemory and Cache Coherence in Multiprocessor System.pdf
Memory and Cache Coherence in Multiprocessor System.pdf
rajaratna4
 
Overview of HPC.pptx
Overview of HPC.pptxOverview of HPC.pptx
Overview of HPC.pptx
sundariprabhu
 
Computer architecture multi core processor
Computer architecture multi core processorComputer architecture multi core processor
Computer architecture multi core processor
Mazin Alwaaly
 
Ch1 introduction
Ch1   introductionCh1   introduction
Ch1 introduction
Welly Dian Astika
 
22CS201 COA
22CS201 COA22CS201 COA
22CS201 COA
Kathirvel Ayyaswamy
 
Module2 MultiThreads.ppt
Module2 MultiThreads.pptModule2 MultiThreads.ppt
Module2 MultiThreads.ppt
shreesha16
 

Similar to Multiprocessor_YChen.ppt (20)

Lect18
Lect18Lect18
Lect18
 
Multiprocessor.pptx
 Multiprocessor.pptx Multiprocessor.pptx
Multiprocessor.pptx
 
CA UNIT IV.pptx
CA UNIT IV.pptxCA UNIT IV.pptx
CA UNIT IV.pptx
 
CSE3120- Module1 part 1 v1.pptx
CSE3120- Module1 part 1 v1.pptxCSE3120- Module1 part 1 v1.pptx
CSE3120- Module1 part 1 v1.pptx
 
Operating System Overview.pdf
Operating System Overview.pdfOperating System Overview.pdf
Operating System Overview.pdf
 
chapter1.ppt
chapter1.pptchapter1.ppt
chapter1.ppt
 
Chip Multithreading Systems Need a New Operating System Scheduler
Chip Multithreading Systems Need a New Operating System Scheduler Chip Multithreading Systems Need a New Operating System Scheduler
Chip Multithreading Systems Need a New Operating System Scheduler
 
Introduction to multi core
Introduction to multi coreIntroduction to multi core
Introduction to multi core
 
MK Sistem Operasi.pdf
MK Sistem Operasi.pdfMK Sistem Operasi.pdf
MK Sistem Operasi.pdf
 
Week 13-14 Parrallel Processing-new.pptx
Week 13-14 Parrallel Processing-new.pptxWeek 13-14 Parrallel Processing-new.pptx
Week 13-14 Parrallel Processing-new.pptx
 
Lecture5
Lecture5Lecture5
Lecture5
 
High performance computing
High performance computingHigh performance computing
High performance computing
 
Computer Architecture & Organization.ppt
Computer Architecture & Organization.pptComputer Architecture & Organization.ppt
Computer Architecture & Organization.ppt
 
assignment_presentaion_jhvvnvhjhbhjhvjh.pptx
assignment_presentaion_jhvvnvhjhbhjhvjh.pptxassignment_presentaion_jhvvnvhjhbhjhvjh.pptx
assignment_presentaion_jhvvnvhjhbhjhvjh.pptx
 
Memory and Cache Coherence in Multiprocessor System.pdf
Memory and Cache Coherence in Multiprocessor System.pdfMemory and Cache Coherence in Multiprocessor System.pdf
Memory and Cache Coherence in Multiprocessor System.pdf
 
Overview of HPC.pptx
Overview of HPC.pptxOverview of HPC.pptx
Overview of HPC.pptx
 
Computer architecture multi core processor
Computer architecture multi core processorComputer architecture multi core processor
Computer architecture multi core processor
 
Ch1 introduction
Ch1   introductionCh1   introduction
Ch1 introduction
 
22CS201 COA
22CS201 COA22CS201 COA
22CS201 COA
 
Module2 MultiThreads.ppt
Module2 MultiThreads.pptModule2 MultiThreads.ppt
Module2 MultiThreads.ppt
 

Recently uploaded

The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptxThe use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
MAGOTI ERNEST
 
nodule formation by alisha dewangan.pptx
nodule formation by alisha dewangan.pptxnodule formation by alisha dewangan.pptx
nodule formation by alisha dewangan.pptx
alishadewangan1
 
Phenomics assisted breeding in crop improvement
Phenomics assisted breeding in crop improvementPhenomics assisted breeding in crop improvement
Phenomics assisted breeding in crop improvement
IshaGoswami9
 
Lateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensiveLateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensive
silvermistyshot
 
Deep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless ReproducibilityDeep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless Reproducibility
University of Rennes, INSA Rennes, Inria/IRISA, CNRS
 
THEMATIC APPERCEPTION TEST(TAT) cognitive abilities, creativity, and critic...
THEMATIC  APPERCEPTION  TEST(TAT) cognitive abilities, creativity, and critic...THEMATIC  APPERCEPTION  TEST(TAT) cognitive abilities, creativity, and critic...
THEMATIC APPERCEPTION TEST(TAT) cognitive abilities, creativity, and critic...
Abdul Wali Khan University Mardan,kP,Pakistan
 
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
University of Maribor
 
SAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdfSAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdf
KrushnaDarade1
 
platelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptxplatelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptx
muralinath2
 
Unveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdfUnveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdf
Erdal Coalmaker
 
Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...
Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...
Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...
Studia Poinsotiana
 
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATIONPRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
ChetanK57
 
NuGOweek 2024 Ghent programme overview flyer
NuGOweek 2024 Ghent programme overview flyerNuGOweek 2024 Ghent programme overview flyer
NuGOweek 2024 Ghent programme overview flyer
pablovgd
 
Orion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWSOrion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWS
Columbia Weather Systems
 
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
yqqaatn0
 
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
University of Maribor
 
Introduction to Mean Field Theory(MFT).pptx
Introduction to Mean Field Theory(MFT).pptxIntroduction to Mean Field Theory(MFT).pptx
Introduction to Mean Field Theory(MFT).pptx
zeex60
 
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
yqqaatn0
 
Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.
Nistarini College, Purulia (W.B) India
 
Mudde & Rovira Kaltwasser. - Populism - a very short introduction [2017].pdf
Mudde & Rovira Kaltwasser. - Populism - a very short introduction [2017].pdfMudde & Rovira Kaltwasser. - Populism - a very short introduction [2017].pdf
Mudde & Rovira Kaltwasser. - Populism - a very short introduction [2017].pdf
frank0071
 

Recently uploaded (20)

The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptxThe use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
 
nodule formation by alisha dewangan.pptx
nodule formation by alisha dewangan.pptxnodule formation by alisha dewangan.pptx
nodule formation by alisha dewangan.pptx
 
Phenomics assisted breeding in crop improvement
Phenomics assisted breeding in crop improvementPhenomics assisted breeding in crop improvement
Phenomics assisted breeding in crop improvement
 
Lateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensiveLateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensive
 
Deep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless ReproducibilityDeep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless Reproducibility
 
THEMATIC APPERCEPTION TEST(TAT) cognitive abilities, creativity, and critic...
THEMATIC  APPERCEPTION  TEST(TAT) cognitive abilities, creativity, and critic...THEMATIC  APPERCEPTION  TEST(TAT) cognitive abilities, creativity, and critic...
THEMATIC APPERCEPTION TEST(TAT) cognitive abilities, creativity, and critic...
 
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
 
SAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdfSAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdf
 
platelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptxplatelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptx
 
Unveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdfUnveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdf
 
Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...
Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...
Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...
 
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATIONPRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
 
NuGOweek 2024 Ghent programme overview flyer
NuGOweek 2024 Ghent programme overview flyerNuGOweek 2024 Ghent programme overview flyer
NuGOweek 2024 Ghent programme overview flyer
 
Orion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWSOrion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWS
 
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
 
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
 
Introduction to Mean Field Theory(MFT).pptx
Introduction to Mean Field Theory(MFT).pptxIntroduction to Mean Field Theory(MFT).pptx
Introduction to Mean Field Theory(MFT).pptx
 
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
 
Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.
 
Mudde & Rovira Kaltwasser. - Populism - a very short introduction [2017].pdf
Mudde & Rovira Kaltwasser. - Populism - a very short introduction [2017].pdfMudde & Rovira Kaltwasser. - Populism - a very short introduction [2017].pdf
Mudde & Rovira Kaltwasser. - Populism - a very short introduction [2017].pdf
 

Multiprocessor_YChen.ppt

  • 1. Multiprocessors ELEC 6200 Computer Architecture and Design Instructor: Dr. Agrawal Yu-Chun Chen 10/27/06
  • 2. Why Choose a Multiprocessor? • A single CPU can only go so fast, use more than one CPU to improve performance • Multiple users • Multiple applications • Multi-tasking within an application • Responsiveness and/or throughput • Share hardware between CPUs
  • 3. Multiprocessor Symmetry • In a multiprocessing system, all CPUs may be equal, or some may be reserved for special purposes. • A combination of hardware and operating-system software design considerations determine the symmetry. • Systems that treat all CPUs equally are called symmetric multiprocessing (SMP) systems. • If all CPUs are not equal, system resources may be divided in a number of ways, including asymmetric multiprocessing (ASMP), non-uniform memory access (NUMA) multiprocessing, and clustered multiprocessing.
  • 4. Instruction and Data Streams Multiprocessors can be used in different ways: • Uniprossesors (single-instruction, single-data or SISD) • Within a single system to execute multiple, independent sequences of instructions in multiple contexts (multiple- instruction, multiple-data or MIMD); • A single sequence of instructions in multiple contexts (single-instruction, multiple-data or SIMD, often used in vector processing); • Multiple sequences of instructions in a single context (multiple-instruction, single-data or MISD, used for redundancy in fail-safe systems and sometimes applied to describe pipelined processors or hyper threading).
  • 5. Processor Coupling Tightly-coupled multiprocessor systems: • Contain multiple CPUs that are connected at the bus level. • These CPUs may have access to a central shared memory (Symmetric Multiprocessing, or SMP), or may participate in a memory hierarchy with both local and shared memory (Non- Uniform Memory Access, or NUMA). • Example: IBM p690 Regatta, Chip multiprocessors, also known as multi-core computing. Loosely-coupled multiprocessor systems: • Often referred as clusters • Based on multiple standalone single or dual processor commodity computers interconnected via a high speed communication system, such as Gigabit ethernet. • Example: Linux Beowulf cluster
  • 6. Multiprocessor Communication Architectures Message Passing • Separate address space for each processor • Processors communicate via message passing • Processors have private memories • Focuses attention on costly non-local operations Shared Memory • Processors communicate with shared address space • Processors communicate by memory read/write • Easy on small-scale machines • Lower latency • SMP or NUMA • The kind that we will focus on today
  • 7. Shared-Memory Processors . . . interconnection network . . . processor 1 cache processor 2 cache processor N cache memory 1 memory M memory 2 •Single copy of the OS (although some parts might be parallel) •Relatively easy to program and port sequential code to •Difficult to scale to large numbers of processors UMA machine block diagram
  • 8. Types of Shared-Memory Architectures UMA • Uniform Memory Access • Access to all memory occurred at the same speed for all processors. • We will focus on UMA today. NUMA • Non-Uniform Memory Access • a.k.a. “Distributed Shared Memory”. • Typically interconnection is grid or hypercube. • Access to some parts of memory is faster for some processors than other parts of memory. • Harder to program, but scales to more processors
  • 9. Bus Based UMA (a) Simplest MP: More than one processor on a single bus connect to memory, bus bandwidth becomes a bottleneck. (b) Each processor has a cache to reduce the need to access to memory. (c) To further scale the number of processors, each processor is given private local memory.
  • 10. NUMA • All memories can be addressed by all processors, but access to a processor’s own local memory is faster than access to another processor’s remote memory. • Looks like a distributed machine, but the interconnection network is usually custom-designed switches and/or buses.
  • 11. OS Option 1 Each CPU has its own OS • Statically allocate physical memory to each CPU • Each CPU runs its own independents OS • Share peripherals • Each CPU handles its processes system calls • Used in early multiprocessor systems • Simple to implement • Avoids concurrency issues by not sharing • Issues: 1. Each processor has its own scheduling queue. 2. Each processor has its own memory partition. 3. Consistency is an issue with independent disk buffer caches and potentially shared files.
  • 12. OS Option 2 Master-Slave Multiprocessors • OS mostly runs on a single fixed CPU. • User-level applications run on the other CPUs. • All system calls are passed to the Master CPU for processing • Very little synchronisation required • Single to implement • Single centralised scheduler to keep all processors busy • Memory can be allocated as needed to all CPUs. • Issues: Master CPU becomes the bottleneck.
  • 13. OS Option 3 Symmetric Multiprocessors (SMP) • OS kernel runs on all processors, while load and resources are balanced between all processors. • One alternative: A single mutex (mutual exclusion object) that make the entire kernel a large critical section; Only one CPU can be in the kernel at a time; Only slight better than master-slave • Better alternative: Identify independent parts of the kernel and make each of them their own critical section, which allows parallelism in the kernel • Issues: A difficult task; Code is mostly similar to uniprocessor code; hard part is identifying independent parts that don’t interfere with each other
  • 14. Earlier Example Quad-Processor Pentium Pro • SMP, bus interconnection. • 4 x 200 MHz Intel Pentium Pro processors. • 8 + 8 Kb L1 cache per processor. • 512 Kb L2 cache per processor. • Snoopy cache coherence. • Employed in Compaq, HP, IBM, NetPower. • OS: Windows NT, Solaris, Linux, etc.
  • 15. Example HP Integrity Superdome • The HP Integrity Superdome is HP's high-end addition to the family of industry-standard Itanium®-based solutions. The Superdome offers several configurations from 2-way multiprocessing all the way to 128 CPUs supporting multiple operating systems such as HP-UX 11iV2, Microsoft Windows Server 2003 Datacentre Edition, Linux and OpenVMS. • Processor: 2 to 64 Intel Itanium 2 processors (1.6 GHz with 9 MB cache) • Memory: Up to 1TB DDR memory • 1-16 cell boards (each cell: 2 or 4 processors and 2 to 32 GB memory) • • 48 - 96 PCI-X internal hot-plug I/O card slots (Optional Server Expansion Unit) • 4 -16 hardware partitions (nPars) using Server Expansion Unit
  • 16. Conclusion • Parallel processing is a future technique for higher performance and effectiveness for multiprogrammed workloads. • MPs combine the difficulties of building complex hardware systems and complex software systems. • Communication, memory, affinity and throughputs presents an important influence on the systems costs and performances • On-chip MPs (MPSoC) technology appears to be growing
  • 17. References • www.intel.com • www.webopedia.com • www.wikipedia.org • www.IBM.com • www.hp.com • www.cis.upenn.edu/~milom/cis700-Spring04/ • www.tkt.cs.tut.fi/kurssit/3200/S06/Luennot/Lec_notes06/ • www.cs.berkeley.edu/~pattrsn • www.cs.caltech.edu/~cs284 • cgi.cse.unsw.edu.au/~cs3231/06s1/lectures