SlideShare a Scribd company logo
Velammal Engineering College
Department of Computer Science
and Engineering
Welcome…
Slide Sources: Patterson & Hennessy COD book
website (copyright Morgan Kaufmann) adapted
and supplemented
Mr. A. Arockia Abins &
Ms. R. Amirthavalli,
Asst. Prof,
CSE,
Velammal Engineering College
Subject Code / Name:
19IT202T /
Computer Architecture
Syllabus – Unit IV
UNIT-IV PARALLELISM
Introduction to Multicore processors and other shared memory
multiprocessors - Flynn's classification: SISD, MIMD, SIMD, SPMD
and Vector - Hardware multithreading: Fine-grained, Coarse-grained and
Simultaneous Multithreading (SMT) - GPU architecture: NVIDIA GPU
Architecture, NVIDIA GPU Memory Structure
Topics:
• Introduction to Multicore processors
• Other shared memory multiprocessors
• Flynn’s classification:
o SISD,
o MIMD,
o SIMD,
o SPMD and Vector
• Hardware multithreading
• GPU architecture
4
Introduction to
Multicore
processors
Multicore processors
• What is a Processor?
o A single chip package that fits in a socket
o Cores can have functional units, cache, etc.
associated with them
• The main goal of the multi-core design is to provide
computing units with an increasing processing power.
• A multicore processor is a single computing
component with two or more “independent”
processors (called "cores").
• known as a chip multiprocessor or CMP
6
EXAMPLES
 dual-core processor with 2 cores
• e.g. AMD Phenom II X2, Intel Core 2 Duo E8500
 quad-core processor with 4 cores
• e.g. AMD Phenom II X4, Intel Core i5 2500T
 hexa-core processor with 6 cores
• e.g. AMD Phenom II X6, Intel Core i7 Extreme Ed. 980X
 octa-core processor with 8 cores
• e.g. AMD FX-8150, Intel Xeon E7-2820
7
Processor
8
Single core
9
Multicore
10
Number of core types
Homogeneous (symmetric) cores:
• All of the cores in a homogeneous multicore
processor are of the same type; typically the core
processing units are general-purpose central
processing units that run a single multicore
operating system.
• Example: Intel Core 2
Heterogeneous (asymmetric) cores:
• Heterogeneous multicore processors have a mix of
core types that often run different operating systems
and include graphics processing units.
• Example: IBM's Cell processor, used in the Sony
PlayStation 3 video game console
11
Homogeneous Multicore Processor
12
Heterogeneous Multicore Processor
13
shared memory multiprocessors
14
Shared Memory Multiprocessors
• A system with multiple CPUs “sharing” the same
main memory is called multiprocessor.
• In a multiprocessor system all processes on the
various CPUs share a unique logical address space,
which is mapped on a physical memory that can be
distributed among the processors.
• Each process can read and write a data item simply
using load and store operations, and process
communication is through shared memory.
15
Shared Memory Multiprocessors
• Processors communicate through shared variables in
memory, with all processors capable of accessing
any memory location via loads and stores.
16
Questions:
• Multicore processor
• Hexacore processor
• Homogeneous Multicore processor
• Heterogeneous Multicore processor
• Multiprocessor
• Shared memory Multiprocessor
17
• Single address space multiprocessors come in two styles.
o Uniform Memory Access (UMA)
o Non-Uniform Memory Access (NUMA)
UMA Architecture:
• In the first style, the latency to a word in memory does
not depend on which processor asks for it. Such
machines are called uniform memory access (UMA)
multiprocessors.
NUMA/DSMA Architecture:
• In the second style, some memory accesses are much
faster than others, depending on which processor asks
for which word, typically because main memory is divided
and attached to different microprocessors or to different
memory controllers on the same chip.
• Such machines are called nonuniform memory access
(NUMA) multiprocessors.
18
Types:
• The shared-memory multiprocessors fall into two
classes, depending on the number of processors
involved, which in turn dictates a memory
organization and interconnect strategy.
• They are:
1. Centralized shared memory (Uniform Memory
Access)
2. Distributed shared memory (NonUniform Memory
Access)
19
1. Centralized shared memory architecture
20
2. Distributed shared memory architecture
21
Flynn’s
classification
Flynn's classification:
• In 1966, Michael Flynn proposed a
classification for computer architectures based
on the number of instruction steams and data
streams (Flynn’s Taxonomy).
o SISD (Single Instruction stream, Single Data
stream)
o SIMD (Single Instruction stream, Multiple Data
streams)
o MISD (Multiple Instruction streams, Single Data
stream)
o MIMD (Multiple Instruction streams, Multiple Data
streams)
23
Flynn's classification:
24
Simple Diagrammatic Representation
SISD
• SISD machines executes a single instruction on individual
data values using a single processor.
• Based on traditional Von Neumann uniprocessor
architecture, instructions are executed sequentially or
serially, one step after the next.
• Until most recently, most computers are of SISD type.
• Conventional uniprocessor
25
SISD
26
SIMD
• An SIMD machine executes a single instruction on
multiple data values simultaneously using many
processors.
• Since there is only one instruction, each processor does
not have to fetch and decode each instruction. Instead, a
single control unit does the fetch and decoding for all
processors.
• SIMD architectures include array processors.
27
SIMD
• Data level parallelism:
o Parallelism achieved by performing the same operation on independent
data.
28
MISD
• Each processor executes a different sequence of instructions.
• In case of MISD computers, multiple processing units operate on one single-
data stream .
• This category does not actually exist. This category was included in the
taxonomy for the sake of completeness.
29
MISD
•
30
Questions:
• Uniform Memory Access (UMA)
• Non-Uniform Memory Access (NUMA)
• Centralized shared memory
• Distributed shared memory
• Flynn’s classification:
31
MIMD
• MIMD machines are usually referred to as
multiprocessors or multicomputers.
• It may execute multiple instructions simultaneously,
contrary to SIMD machines.
• Each processor must include its own control unit that will
assign to the processors parts of a task or a separate
task.
• It has two subclasses: Shared memory and distributed
memory
32
MIMD
33
Analogy of Flynn’s Classifications
• An analogy of Flynn’s classification is the check-in desk at an airport
 SISD: a single desk
 SIMD: many desks and a supervisor with a megaphone giving instructions that every desk obeys
 MIMD: many desks working at their own pace, synchronized through a central database
34
Hardware categorization
35
SSE : Streaming SIMD Extensions
Processor Organizations
Computer Architecture
Classifications
Single Instruction, Single Instruction, Multiple Instruction
Multiple Instruction
Single Data Stream Multiple Data Stream Single Data Stream
Multiple Data Stream
(SISD) (SIMD) (MISD) (MIMD)
Uniprocessor Vector Array Shared Memory Multicomputer
Processor Processor (tightly coupled) (loosely
coupled)
Vector
• more elegant interpretation of SIMD is called a vector architecture
• the vector architectures pipelined the ALU to get good performance at lower
cost
• to collect data elements from memory, put them in order into a large set of
registers, operate on them sequentially in registers using pipelined
execution units.
• then write the results back to memory
37
Structure of a vector unit containing four lanes
38
vector lane
• One or more vector functional units and a portion of the vector register fi le.
39
Questions:
• MIMD
• Examples for Flynn’s classification
40
Hardware
multithreading
Hardware multithreading
• A thread is a lightweight process with its own instructions
and data.
• Each thread has all the state (instructions, data, PC,
register state, etc.) necessary to allow it to execute.
• Multithreading (MT) allows multiple threads to share the
functional units of a single processor.
42
Hardware multithreading
• Increasing utilization of a processor by switching to
another thread when one thread is stalled.
• Types of Multithreading:
o Fine-grained Multithreading
• Cycle by cycle
o Coarse-grained Multithreading
• Switch on event (e.g., cache miss)
o Simultaneous Multithreading (SMT)
• Instructions from multiple threads executed concurrently in the same
cycle
43
Thread B
Thread A Thread D
Thread C
4-issue machine
Fine-grained MT
Idea: Switch to another thread every cycle such
that no two instructions from the thread are in the
pipeline concurrently
Advantages
+ No need for dependency checking between instructions
(only one instruction in pipeline from a single thread)
+ No need for branch prediction logic
+ Otherwise-bubble cycles used for executing useful instructions
from different threads
+ Improved system throughput, latency tolerance, utilization
Fine-grained MT
Idea: Switch to another thread every cycle such
that no two instructions from the thread are in the
pipeline concurrently
Disadvantages
- Extra hardware complexity: multiple hardware contexts, thread
selection logic
- Reduced single thread performance (one instruction fetched every
N cycles)
- Resource contention between threads in caches and memory
- Dependency checking logic between threads remains (load/store)
47
Fine-grained MT
1
2
3
4
5
6
7
8
9
10
11
12
3 3 3
3
6
4 4
4 4
4
7
5 5
5 5 5 5
5
8 8 8
6 6 6 6
7 7 7 7
9 9
7 7
11
8 8
8 8
10 10 10
1 1
1 1 1
1 1 1
1
Time stamp of
single thread
execution
2
2 2
5 5
3 3 3
48
Coarse-grained MT switches threads only on
costly stalls, such as L2 misses.
The processor is not slowed down (by thread
switching), since instructions from other threads
will only be issued when a thread encounters a
costly stall.
Since a CPU with coarse-grained MT issues
instructions from a single thread, when a stall
occurs the pipeline must be emptied.
The new thread must fill the pipeline before
instructions will be able to complete.
49
Coarse-grained MT switches threads only on
costly stalls, such as L2 misses.
Advantages:
– thread switching doesn’t have to be essentially
free and much less likely to slow down the execution of an
individual thread
Disadvantage:
– limited, due to pipeline start-up costs, in its ability
to overcome throughput loss
Pipeline must be flushed and refilled on thread
switches
50
Coarse-grained MT
Questions
• Define thread.
• What is mean by hardware multithreading?
• Types of multithreading
June 2015
ILP Limits and Multithreading 51
Simultaneous Multithreading
52
Simultaneous multithreading (SMT) is a
variation on MT to exploit TLP simultaneously
with ILP.
SMT is motivated by multiple-issue processors
which have more functional unit parallelism than a
single thread can effectively use.
Multiple instructions from different threads can be
issued
53
1
2
3
4
5
6
7
8
9
10
11
12
2 2 2
4
4 4
4 4
5
5 5
5 5
5 5
5 5
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
6
6
6 6 6
7
7 7 7 7
7
7 11
9 9
9 9
10 10 10
8 8 8
8
8
8
8 12 12 12
1 1 1 1
1 1 1 1
1
Time stamp of
single thread
execution
54
Approaches to use the issue slots.
55
Amdahl’s law
Speedup
• Speedup measures increase in running time due to
parallelism. The number of PEs is given by n.
• Based on running times, S(n) = ts/tp , where
o ts is the execution time on a single processor, using the fastest known
sequential algorithm
o tp is the execution time using a parallel processor.
• For theoretical analysis, S(n) = ts/tp where
o ts is the worst case running time for of the fastest known sequential algorithm
for the problem
o tp is the worst case running time of the parallel algorithm using n PEs.
57
Speedup in Simplest
Terms
58
Amdahl’s law:
“It states that the potential speedup gained by the parallel execution of a
program is limited by the portion that can be parallelized.”
59
Amdahl’s law
• execution time before is 1 for some unit of time
60
Question:
• When parallelizing an application, the ideal speedup is speeding up by the
number of processors. What is the speedup with 8 processors if 60% of the
application is parallelizable?
61
Question:
• When parallelizing an application, the ideal speedup is speeding up by the
number of processors. What is the speedup with 8 processors if 80% of the
application is parallelizable?
62
QUESTION:
• Suppose that we are considering an enhancement that runs 10 times faster
than the original machine but is usable only 40% of the time. What is the
overall speedup gained by incorporating the enhancement.?
63
Question
• Suppose you want to achieve a speed-up of 90
times faster with 100 processors. What
percentage of the original computation can be
sequential?
64
Question
• Suppose you want to achieve a speed-up of 90
times faster with 100 processors. What
percentage of the original computation can be
sequential?
65
Question
• Suppose you want to perform two sums: one is a sum of 10
scalar variables, and one is a matrix sum of a pair of two-
dimensional arrays, with dimensions 10 by 10. For now
let’s assume only the matrix sum is parallelizable. What
speed-up do you get with 10 versus 40 processors?
• Next, calculate the speed-ups assuming the matrices grow
to 20 by 20.
66
Graphics processing
unit (GPU)
Graphics processing unit (GPU)
• It is a processor optimized for 2D/3D graphics, video, visual computing, and display.
• It is highly parallel, highly multithreaded multiprocessor optimized for visual
computing.
• It provide real-time visual interaction with computed objects via graphics images,
and video.
• Heterogeneous Systems: combine a GPU with a CPU
68
GPU Hardware
69
70
An Introduction to the NVIDIA GPU Architecture
71
NVIDIA GPU Memory Structures
72
Thank you…

More Related Content

What's hot

Flynns classification
Flynns classificationFlynns classification
Flynns classification
Yasir Khan
 
Hardware and Software parallelism
Hardware and Software parallelismHardware and Software parallelism
Hardware and Software parallelism
prashantdahake
 
Memory consistency models
Memory consistency modelsMemory consistency models
Memory consistency models
palani kumar
 
Multithreaded processors ppt
Multithreaded processors pptMultithreaded processors ppt
Multithreaded processors ppt
Siddhartha Anand
 
Distributed shred memory architecture
Distributed shred memory architectureDistributed shred memory architecture
Distributed shred memory architecture
Maulik Togadiya
 
Parallel Programming
Parallel ProgrammingParallel Programming
Parallel Programming
Uday Sharma
 
Cache coherence
Cache coherenceCache coherence
Cache coherence
Employee
 
Cache coherence ppt
Cache coherence pptCache coherence ppt
Cache coherence ppt
ArendraSingh2
 
Parallel processing
Parallel processingParallel processing
Parallel processing
rajshreemuthiah
 
RTOS - Real Time Operating Systems
RTOS - Real Time Operating SystemsRTOS - Real Time Operating Systems
RTOS - Real Time Operating Systems
Emertxe Information Technologies Pvt Ltd
 
Multithreading
MultithreadingMultithreading
Multithreading
A B Shinde
 
Superscalar Architecture_AIUB
Superscalar Architecture_AIUBSuperscalar Architecture_AIUB
Superscalar Architecture_AIUB
Nusrat Mary
 
Multiprocessor
MultiprocessorMultiprocessor
Multiprocessor
Kamal Acharya
 
Computer architecture multi processor
Computer architecture multi processorComputer architecture multi processor
Computer architecture multi processor
Mazin Alwaaly
 
Multicore Computers
Multicore ComputersMulticore Computers
Multicore Computers
A B Shinde
 
Hadoop And Their Ecosystem ppt
 Hadoop And Their Ecosystem ppt Hadoop And Their Ecosystem ppt
Hadoop And Their Ecosystem ppt
sunera pathan
 
Schemaless Databases
Schemaless DatabasesSchemaless Databases
Schemaless Databases
Dan Gunter
 
Dram and its types
Dram and its typesDram and its types
Dram and its types
Muhammad Ishaq
 
Multithreading computer architecture
 Multithreading computer architecture  Multithreading computer architecture
Multithreading computer architecture
Haris456
 
Shared Memory Multi Processor
Shared Memory Multi ProcessorShared Memory Multi Processor
Shared Memory Multi Processor
babuece
 

What's hot (20)

Flynns classification
Flynns classificationFlynns classification
Flynns classification
 
Hardware and Software parallelism
Hardware and Software parallelismHardware and Software parallelism
Hardware and Software parallelism
 
Memory consistency models
Memory consistency modelsMemory consistency models
Memory consistency models
 
Multithreaded processors ppt
Multithreaded processors pptMultithreaded processors ppt
Multithreaded processors ppt
 
Distributed shred memory architecture
Distributed shred memory architectureDistributed shred memory architecture
Distributed shred memory architecture
 
Parallel Programming
Parallel ProgrammingParallel Programming
Parallel Programming
 
Cache coherence
Cache coherenceCache coherence
Cache coherence
 
Cache coherence ppt
Cache coherence pptCache coherence ppt
Cache coherence ppt
 
Parallel processing
Parallel processingParallel processing
Parallel processing
 
RTOS - Real Time Operating Systems
RTOS - Real Time Operating SystemsRTOS - Real Time Operating Systems
RTOS - Real Time Operating Systems
 
Multithreading
MultithreadingMultithreading
Multithreading
 
Superscalar Architecture_AIUB
Superscalar Architecture_AIUBSuperscalar Architecture_AIUB
Superscalar Architecture_AIUB
 
Multiprocessor
MultiprocessorMultiprocessor
Multiprocessor
 
Computer architecture multi processor
Computer architecture multi processorComputer architecture multi processor
Computer architecture multi processor
 
Multicore Computers
Multicore ComputersMulticore Computers
Multicore Computers
 
Hadoop And Their Ecosystem ppt
 Hadoop And Their Ecosystem ppt Hadoop And Their Ecosystem ppt
Hadoop And Their Ecosystem ppt
 
Schemaless Databases
Schemaless DatabasesSchemaless Databases
Schemaless Databases
 
Dram and its types
Dram and its typesDram and its types
Dram and its types
 
Multithreading computer architecture
 Multithreading computer architecture  Multithreading computer architecture
Multithreading computer architecture
 
Shared Memory Multi Processor
Shared Memory Multi ProcessorShared Memory Multi Processor
Shared Memory Multi Processor
 

Similar to PARALLELISM IN MULTICORE PROCESSORS

Multiprocessor.pptx
 Multiprocessor.pptx Multiprocessor.pptx
Multiprocessor.pptx
Muhammad54342
 
CA UNIT IV.pptx
CA UNIT IV.pptxCA UNIT IV.pptx
CA UNIT IV.pptx
ssuser9dbd7e
 
Parallel Processors (SIMD)
Parallel Processors (SIMD) Parallel Processors (SIMD)
Parallel Processors (SIMD)
Ali Raza
 
Parallel Processors (SIMD)
Parallel Processors (SIMD) Parallel Processors (SIMD)
Parallel Processors (SIMD)
Ali Raza
 
Flynn's Classification .pptx
Flynn's Classification .pptxFlynn's Classification .pptx
Flynn's Classification .pptx
Nayan Gupta
 
27 multicore
27 multicore27 multicore
27 multicore
ssuser47ae65
 
Parallel architecture-programming
Parallel architecture-programmingParallel architecture-programming
Parallel architecture-programming
Shaveta Banda
 
Parallel architecture &programming
Parallel architecture &programmingParallel architecture &programming
Parallel architecture &programming
Ismail El Gayar
 
BIL406-Chapter-2-Classifications of Parallel Systems.ppt
BIL406-Chapter-2-Classifications of Parallel Systems.pptBIL406-Chapter-2-Classifications of Parallel Systems.ppt
BIL406-Chapter-2-Classifications of Parallel Systems.ppt
Kadri20
 
27 multicore
27 multicore27 multicore
27 multicore
Rishabh Jain
 
Lecture 2
Lecture 2Lecture 2
Lecture 2
Mr SMAK
 
fundamentals of digital communication Unit 5_microprocessor.pdf
fundamentals of digital communication Unit 5_microprocessor.pdffundamentals of digital communication Unit 5_microprocessor.pdf
fundamentals of digital communication Unit 5_microprocessor.pdf
shubhangisonawane6
 
Classification of Parallel Computers.pptx
Classification of Parallel Computers.pptxClassification of Parallel Computers.pptx
Classification of Parallel Computers.pptx
Neeraj Singh
 
Ceg4131 models
Ceg4131 modelsCeg4131 models
Ceg4131 models
anandme07
 
18 parallel processing
18 parallel processing18 parallel processing
18 parallel processing
dilip kumar
 
Array Processors & Architectural Classification Schemes_Computer Architecture...
Array Processors & Architectural Classification Schemes_Computer Architecture...Array Processors & Architectural Classification Schemes_Computer Architecture...
Array Processors & Architectural Classification Schemes_Computer Architecture...
Sumalatha A
 
parallel-processing.ppt
parallel-processing.pptparallel-processing.ppt
parallel-processing.ppt
MohammedAbdelgader2
 
Real-Time Scheduling Algorithms
Real-Time Scheduling AlgorithmsReal-Time Scheduling Algorithms
Real-Time Scheduling Algorithms
AJAL A J
 
Multiprocessor_YChen.ppt
Multiprocessor_YChen.pptMultiprocessor_YChen.ppt
Multiprocessor_YChen.ppt
AberaZeleke1
 
aca mod1.pptx
aca mod1.pptxaca mod1.pptx
aca mod1.pptx
Shiva Kumar V
 

Similar to PARALLELISM IN MULTICORE PROCESSORS (20)

Multiprocessor.pptx
 Multiprocessor.pptx Multiprocessor.pptx
Multiprocessor.pptx
 
CA UNIT IV.pptx
CA UNIT IV.pptxCA UNIT IV.pptx
CA UNIT IV.pptx
 
Parallel Processors (SIMD)
Parallel Processors (SIMD) Parallel Processors (SIMD)
Parallel Processors (SIMD)
 
Parallel Processors (SIMD)
Parallel Processors (SIMD) Parallel Processors (SIMD)
Parallel Processors (SIMD)
 
Flynn's Classification .pptx
Flynn's Classification .pptxFlynn's Classification .pptx
Flynn's Classification .pptx
 
27 multicore
27 multicore27 multicore
27 multicore
 
Parallel architecture-programming
Parallel architecture-programmingParallel architecture-programming
Parallel architecture-programming
 
Parallel architecture &programming
Parallel architecture &programmingParallel architecture &programming
Parallel architecture &programming
 
BIL406-Chapter-2-Classifications of Parallel Systems.ppt
BIL406-Chapter-2-Classifications of Parallel Systems.pptBIL406-Chapter-2-Classifications of Parallel Systems.ppt
BIL406-Chapter-2-Classifications of Parallel Systems.ppt
 
27 multicore
27 multicore27 multicore
27 multicore
 
Lecture 2
Lecture 2Lecture 2
Lecture 2
 
fundamentals of digital communication Unit 5_microprocessor.pdf
fundamentals of digital communication Unit 5_microprocessor.pdffundamentals of digital communication Unit 5_microprocessor.pdf
fundamentals of digital communication Unit 5_microprocessor.pdf
 
Classification of Parallel Computers.pptx
Classification of Parallel Computers.pptxClassification of Parallel Computers.pptx
Classification of Parallel Computers.pptx
 
Ceg4131 models
Ceg4131 modelsCeg4131 models
Ceg4131 models
 
18 parallel processing
18 parallel processing18 parallel processing
18 parallel processing
 
Array Processors & Architectural Classification Schemes_Computer Architecture...
Array Processors & Architectural Classification Schemes_Computer Architecture...Array Processors & Architectural Classification Schemes_Computer Architecture...
Array Processors & Architectural Classification Schemes_Computer Architecture...
 
parallel-processing.ppt
parallel-processing.pptparallel-processing.ppt
parallel-processing.ppt
 
Real-Time Scheduling Algorithms
Real-Time Scheduling AlgorithmsReal-Time Scheduling Algorithms
Real-Time Scheduling Algorithms
 
Multiprocessor_YChen.ppt
Multiprocessor_YChen.pptMultiprocessor_YChen.ppt
Multiprocessor_YChen.ppt
 
aca mod1.pptx
aca mod1.pptxaca mod1.pptx
aca mod1.pptx
 

Recently uploaded

哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
insn4465
 
Understanding Inductive Bias in Machine Learning
Understanding Inductive Bias in Machine LearningUnderstanding Inductive Bias in Machine Learning
Understanding Inductive Bias in Machine Learning
SUTEJAS
 
International Conference on NLP, Artificial Intelligence, Machine Learning an...
International Conference on NLP, Artificial Intelligence, Machine Learning an...International Conference on NLP, Artificial Intelligence, Machine Learning an...
International Conference on NLP, Artificial Intelligence, Machine Learning an...
gerogepatton
 
IEEE Aerospace and Electronic Systems Society as a Graduate Student Member
IEEE Aerospace and Electronic Systems Society as a Graduate Student MemberIEEE Aerospace and Electronic Systems Society as a Graduate Student Member
IEEE Aerospace and Electronic Systems Society as a Graduate Student Member
VICTOR MAESTRE RAMIREZ
 
Unit-III-ELECTROCHEMICAL STORAGE DEVICES.ppt
Unit-III-ELECTROCHEMICAL STORAGE DEVICES.pptUnit-III-ELECTROCHEMICAL STORAGE DEVICES.ppt
Unit-III-ELECTROCHEMICAL STORAGE DEVICES.ppt
KrishnaveniKrishnara1
 
Comparative analysis between traditional aquaponics and reconstructed aquapon...
Comparative analysis between traditional aquaponics and reconstructed aquapon...Comparative analysis between traditional aquaponics and reconstructed aquapon...
Comparative analysis between traditional aquaponics and reconstructed aquapon...
bijceesjournal
 
ML Based Model for NIDS MSc Updated Presentation.v2.pptx
ML Based Model for NIDS MSc Updated Presentation.v2.pptxML Based Model for NIDS MSc Updated Presentation.v2.pptx
ML Based Model for NIDS MSc Updated Presentation.v2.pptx
JamalHussainArman
 
spirit beverages ppt without graphics.pptx
spirit beverages ppt without graphics.pptxspirit beverages ppt without graphics.pptx
spirit beverages ppt without graphics.pptx
Madan Karki
 
Recycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part IIIRecycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part III
Aditya Rajan Patra
 
Harnessing WebAssembly for Real-time Stateless Streaming Pipelines
Harnessing WebAssembly for Real-time Stateless Streaming PipelinesHarnessing WebAssembly for Real-time Stateless Streaming Pipelines
Harnessing WebAssembly for Real-time Stateless Streaming Pipelines
Christina Lin
 
Embedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoringEmbedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoring
IJECEIAES
 
学校原版美国波士顿大学毕业证学历学位证书原版一模一样
学校原版美国波士顿大学毕业证学历学位证书原版一模一样学校原版美国波士顿大学毕业证学历学位证书原版一模一样
学校原版美国波士顿大学毕业证学历学位证书原版一模一样
171ticu
 
132/33KV substation case study Presentation
132/33KV substation case study Presentation132/33KV substation case study Presentation
132/33KV substation case study Presentation
kandramariana6
 
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressionsKuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
Victor Morales
 
Properties Railway Sleepers and Test.pptx
Properties Railway Sleepers and Test.pptxProperties Railway Sleepers and Test.pptx
Properties Railway Sleepers and Test.pptx
MDSABBIROJJAMANPAYEL
 
Question paper of renewable energy sources
Question paper of renewable energy sourcesQuestion paper of renewable energy sources
Question paper of renewable energy sources
mahammadsalmanmech
 
ISPM 15 Heat Treated Wood Stamps and why your shipping must have one
ISPM 15 Heat Treated Wood Stamps and why your shipping must have oneISPM 15 Heat Treated Wood Stamps and why your shipping must have one
ISPM 15 Heat Treated Wood Stamps and why your shipping must have one
Las Vegas Warehouse
 
Heat Resistant Concrete Presentation ppt
Heat Resistant Concrete Presentation pptHeat Resistant Concrete Presentation ppt
Heat Resistant Concrete Presentation ppt
mamunhossenbd75
 
Iron and Steel Technology Roadmap - Towards more sustainable steelmaking.pdf
Iron and Steel Technology Roadmap - Towards more sustainable steelmaking.pdfIron and Steel Technology Roadmap - Towards more sustainable steelmaking.pdf
Iron and Steel Technology Roadmap - Towards more sustainable steelmaking.pdf
RadiNasr
 
Modelagem de um CSTR com reação endotermica.pdf
Modelagem de um CSTR com reação endotermica.pdfModelagem de um CSTR com reação endotermica.pdf
Modelagem de um CSTR com reação endotermica.pdf
camseq
 

Recently uploaded (20)

哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
 
Understanding Inductive Bias in Machine Learning
Understanding Inductive Bias in Machine LearningUnderstanding Inductive Bias in Machine Learning
Understanding Inductive Bias in Machine Learning
 
International Conference on NLP, Artificial Intelligence, Machine Learning an...
International Conference on NLP, Artificial Intelligence, Machine Learning an...International Conference on NLP, Artificial Intelligence, Machine Learning an...
International Conference on NLP, Artificial Intelligence, Machine Learning an...
 
IEEE Aerospace and Electronic Systems Society as a Graduate Student Member
IEEE Aerospace and Electronic Systems Society as a Graduate Student MemberIEEE Aerospace and Electronic Systems Society as a Graduate Student Member
IEEE Aerospace and Electronic Systems Society as a Graduate Student Member
 
Unit-III-ELECTROCHEMICAL STORAGE DEVICES.ppt
Unit-III-ELECTROCHEMICAL STORAGE DEVICES.pptUnit-III-ELECTROCHEMICAL STORAGE DEVICES.ppt
Unit-III-ELECTROCHEMICAL STORAGE DEVICES.ppt
 
Comparative analysis between traditional aquaponics and reconstructed aquapon...
Comparative analysis between traditional aquaponics and reconstructed aquapon...Comparative analysis between traditional aquaponics and reconstructed aquapon...
Comparative analysis between traditional aquaponics and reconstructed aquapon...
 
ML Based Model for NIDS MSc Updated Presentation.v2.pptx
ML Based Model for NIDS MSc Updated Presentation.v2.pptxML Based Model for NIDS MSc Updated Presentation.v2.pptx
ML Based Model for NIDS MSc Updated Presentation.v2.pptx
 
spirit beverages ppt without graphics.pptx
spirit beverages ppt without graphics.pptxspirit beverages ppt without graphics.pptx
spirit beverages ppt without graphics.pptx
 
Recycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part IIIRecycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part III
 
Harnessing WebAssembly for Real-time Stateless Streaming Pipelines
Harnessing WebAssembly for Real-time Stateless Streaming PipelinesHarnessing WebAssembly for Real-time Stateless Streaming Pipelines
Harnessing WebAssembly for Real-time Stateless Streaming Pipelines
 
Embedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoringEmbedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoring
 
学校原版美国波士顿大学毕业证学历学位证书原版一模一样
学校原版美国波士顿大学毕业证学历学位证书原版一模一样学校原版美国波士顿大学毕业证学历学位证书原版一模一样
学校原版美国波士顿大学毕业证学历学位证书原版一模一样
 
132/33KV substation case study Presentation
132/33KV substation case study Presentation132/33KV substation case study Presentation
132/33KV substation case study Presentation
 
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressionsKuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
 
Properties Railway Sleepers and Test.pptx
Properties Railway Sleepers and Test.pptxProperties Railway Sleepers and Test.pptx
Properties Railway Sleepers and Test.pptx
 
Question paper of renewable energy sources
Question paper of renewable energy sourcesQuestion paper of renewable energy sources
Question paper of renewable energy sources
 
ISPM 15 Heat Treated Wood Stamps and why your shipping must have one
ISPM 15 Heat Treated Wood Stamps and why your shipping must have oneISPM 15 Heat Treated Wood Stamps and why your shipping must have one
ISPM 15 Heat Treated Wood Stamps and why your shipping must have one
 
Heat Resistant Concrete Presentation ppt
Heat Resistant Concrete Presentation pptHeat Resistant Concrete Presentation ppt
Heat Resistant Concrete Presentation ppt
 
Iron and Steel Technology Roadmap - Towards more sustainable steelmaking.pdf
Iron and Steel Technology Roadmap - Towards more sustainable steelmaking.pdfIron and Steel Technology Roadmap - Towards more sustainable steelmaking.pdf
Iron and Steel Technology Roadmap - Towards more sustainable steelmaking.pdf
 
Modelagem de um CSTR com reação endotermica.pdf
Modelagem de um CSTR com reação endotermica.pdfModelagem de um CSTR com reação endotermica.pdf
Modelagem de um CSTR com reação endotermica.pdf
 

PARALLELISM IN MULTICORE PROCESSORS

  • 1. Velammal Engineering College Department of Computer Science and Engineering Welcome… Slide Sources: Patterson & Hennessy COD book website (copyright Morgan Kaufmann) adapted and supplemented Mr. A. Arockia Abins & Ms. R. Amirthavalli, Asst. Prof, CSE, Velammal Engineering College
  • 2. Subject Code / Name: 19IT202T / Computer Architecture
  • 3. Syllabus – Unit IV UNIT-IV PARALLELISM Introduction to Multicore processors and other shared memory multiprocessors - Flynn's classification: SISD, MIMD, SIMD, SPMD and Vector - Hardware multithreading: Fine-grained, Coarse-grained and Simultaneous Multithreading (SMT) - GPU architecture: NVIDIA GPU Architecture, NVIDIA GPU Memory Structure
  • 4. Topics: • Introduction to Multicore processors • Other shared memory multiprocessors • Flynn’s classification: o SISD, o MIMD, o SIMD, o SPMD and Vector • Hardware multithreading • GPU architecture 4
  • 6. Multicore processors • What is a Processor? o A single chip package that fits in a socket o Cores can have functional units, cache, etc. associated with them • The main goal of the multi-core design is to provide computing units with an increasing processing power. • A multicore processor is a single computing component with two or more “independent” processors (called "cores"). • known as a chip multiprocessor or CMP 6
  • 7. EXAMPLES  dual-core processor with 2 cores • e.g. AMD Phenom II X2, Intel Core 2 Duo E8500  quad-core processor with 4 cores • e.g. AMD Phenom II X4, Intel Core i5 2500T  hexa-core processor with 6 cores • e.g. AMD Phenom II X6, Intel Core i7 Extreme Ed. 980X  octa-core processor with 8 cores • e.g. AMD FX-8150, Intel Xeon E7-2820 7
  • 11. Number of core types Homogeneous (symmetric) cores: • All of the cores in a homogeneous multicore processor are of the same type; typically the core processing units are general-purpose central processing units that run a single multicore operating system. • Example: Intel Core 2 Heterogeneous (asymmetric) cores: • Heterogeneous multicore processors have a mix of core types that often run different operating systems and include graphics processing units. • Example: IBM's Cell processor, used in the Sony PlayStation 3 video game console 11
  • 15. Shared Memory Multiprocessors • A system with multiple CPUs “sharing” the same main memory is called multiprocessor. • In a multiprocessor system all processes on the various CPUs share a unique logical address space, which is mapped on a physical memory that can be distributed among the processors. • Each process can read and write a data item simply using load and store operations, and process communication is through shared memory. 15
  • 16. Shared Memory Multiprocessors • Processors communicate through shared variables in memory, with all processors capable of accessing any memory location via loads and stores. 16
  • 17. Questions: • Multicore processor • Hexacore processor • Homogeneous Multicore processor • Heterogeneous Multicore processor • Multiprocessor • Shared memory Multiprocessor 17
  • 18. • Single address space multiprocessors come in two styles. o Uniform Memory Access (UMA) o Non-Uniform Memory Access (NUMA) UMA Architecture: • In the first style, the latency to a word in memory does not depend on which processor asks for it. Such machines are called uniform memory access (UMA) multiprocessors. NUMA/DSMA Architecture: • In the second style, some memory accesses are much faster than others, depending on which processor asks for which word, typically because main memory is divided and attached to different microprocessors or to different memory controllers on the same chip. • Such machines are called nonuniform memory access (NUMA) multiprocessors. 18
  • 19. Types: • The shared-memory multiprocessors fall into two classes, depending on the number of processors involved, which in turn dictates a memory organization and interconnect strategy. • They are: 1. Centralized shared memory (Uniform Memory Access) 2. Distributed shared memory (NonUniform Memory Access) 19
  • 20. 1. Centralized shared memory architecture 20
  • 21. 2. Distributed shared memory architecture 21
  • 23. Flynn's classification: • In 1966, Michael Flynn proposed a classification for computer architectures based on the number of instruction steams and data streams (Flynn’s Taxonomy). o SISD (Single Instruction stream, Single Data stream) o SIMD (Single Instruction stream, Multiple Data streams) o MISD (Multiple Instruction streams, Single Data stream) o MIMD (Multiple Instruction streams, Multiple Data streams) 23
  • 25. SISD • SISD machines executes a single instruction on individual data values using a single processor. • Based on traditional Von Neumann uniprocessor architecture, instructions are executed sequentially or serially, one step after the next. • Until most recently, most computers are of SISD type. • Conventional uniprocessor 25
  • 27. SIMD • An SIMD machine executes a single instruction on multiple data values simultaneously using many processors. • Since there is only one instruction, each processor does not have to fetch and decode each instruction. Instead, a single control unit does the fetch and decoding for all processors. • SIMD architectures include array processors. 27
  • 28. SIMD • Data level parallelism: o Parallelism achieved by performing the same operation on independent data. 28
  • 29. MISD • Each processor executes a different sequence of instructions. • In case of MISD computers, multiple processing units operate on one single- data stream . • This category does not actually exist. This category was included in the taxonomy for the sake of completeness. 29
  • 31. Questions: • Uniform Memory Access (UMA) • Non-Uniform Memory Access (NUMA) • Centralized shared memory • Distributed shared memory • Flynn’s classification: 31
  • 32. MIMD • MIMD machines are usually referred to as multiprocessors or multicomputers. • It may execute multiple instructions simultaneously, contrary to SIMD machines. • Each processor must include its own control unit that will assign to the processors parts of a task or a separate task. • It has two subclasses: Shared memory and distributed memory 32
  • 34. Analogy of Flynn’s Classifications • An analogy of Flynn’s classification is the check-in desk at an airport  SISD: a single desk  SIMD: many desks and a supervisor with a megaphone giving instructions that every desk obeys  MIMD: many desks working at their own pace, synchronized through a central database 34
  • 35. Hardware categorization 35 SSE : Streaming SIMD Extensions
  • 36. Processor Organizations Computer Architecture Classifications Single Instruction, Single Instruction, Multiple Instruction Multiple Instruction Single Data Stream Multiple Data Stream Single Data Stream Multiple Data Stream (SISD) (SIMD) (MISD) (MIMD) Uniprocessor Vector Array Shared Memory Multicomputer Processor Processor (tightly coupled) (loosely coupled)
  • 37. Vector • more elegant interpretation of SIMD is called a vector architecture • the vector architectures pipelined the ALU to get good performance at lower cost • to collect data elements from memory, put them in order into a large set of registers, operate on them sequentially in registers using pipelined execution units. • then write the results back to memory 37
  • 38. Structure of a vector unit containing four lanes 38
  • 39. vector lane • One or more vector functional units and a portion of the vector register fi le. 39
  • 40. Questions: • MIMD • Examples for Flynn’s classification 40
  • 42. Hardware multithreading • A thread is a lightweight process with its own instructions and data. • Each thread has all the state (instructions, data, PC, register state, etc.) necessary to allow it to execute. • Multithreading (MT) allows multiple threads to share the functional units of a single processor. 42
  • 43. Hardware multithreading • Increasing utilization of a processor by switching to another thread when one thread is stalled. • Types of Multithreading: o Fine-grained Multithreading • Cycle by cycle o Coarse-grained Multithreading • Switch on event (e.g., cache miss) o Simultaneous Multithreading (SMT) • Instructions from multiple threads executed concurrently in the same cycle 43
  • 44. Thread B Thread A Thread D Thread C 4-issue machine
  • 45. Fine-grained MT Idea: Switch to another thread every cycle such that no two instructions from the thread are in the pipeline concurrently Advantages + No need for dependency checking between instructions (only one instruction in pipeline from a single thread) + No need for branch prediction logic + Otherwise-bubble cycles used for executing useful instructions from different threads + Improved system throughput, latency tolerance, utilization
  • 46. Fine-grained MT Idea: Switch to another thread every cycle such that no two instructions from the thread are in the pipeline concurrently Disadvantages - Extra hardware complexity: multiple hardware contexts, thread selection logic - Reduced single thread performance (one instruction fetched every N cycles) - Resource contention between threads in caches and memory - Dependency checking logic between threads remains (load/store)
  • 47. 47 Fine-grained MT 1 2 3 4 5 6 7 8 9 10 11 12 3 3 3 3 6 4 4 4 4 4 7 5 5 5 5 5 5 5 8 8 8 6 6 6 6 7 7 7 7 9 9 7 7 11 8 8 8 8 10 10 10 1 1 1 1 1 1 1 1 1 Time stamp of single thread execution 2 2 2 5 5 3 3 3
  • 48. 48 Coarse-grained MT switches threads only on costly stalls, such as L2 misses. The processor is not slowed down (by thread switching), since instructions from other threads will only be issued when a thread encounters a costly stall. Since a CPU with coarse-grained MT issues instructions from a single thread, when a stall occurs the pipeline must be emptied. The new thread must fill the pipeline before instructions will be able to complete.
  • 49. 49 Coarse-grained MT switches threads only on costly stalls, such as L2 misses. Advantages: – thread switching doesn’t have to be essentially free and much less likely to slow down the execution of an individual thread Disadvantage: – limited, due to pipeline start-up costs, in its ability to overcome throughput loss Pipeline must be flushed and refilled on thread switches
  • 51. Questions • Define thread. • What is mean by hardware multithreading? • Types of multithreading June 2015 ILP Limits and Multithreading 51
  • 52. Simultaneous Multithreading 52 Simultaneous multithreading (SMT) is a variation on MT to exploit TLP simultaneously with ILP. SMT is motivated by multiple-issue processors which have more functional unit parallelism than a single thread can effectively use. Multiple instructions from different threads can be issued
  • 53. 53 1 2 3 4 5 6 7 8 9 10 11 12 2 2 2 4 4 4 4 4 5 5 5 5 5 5 5 5 5 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 6 6 6 6 6 7 7 7 7 7 7 7 11 9 9 9 9 10 10 10 8 8 8 8 8 8 8 12 12 12 1 1 1 1 1 1 1 1 1 Time stamp of single thread execution
  • 54. 54 Approaches to use the issue slots.
  • 55. 55
  • 57. Speedup • Speedup measures increase in running time due to parallelism. The number of PEs is given by n. • Based on running times, S(n) = ts/tp , where o ts is the execution time on a single processor, using the fastest known sequential algorithm o tp is the execution time using a parallel processor. • For theoretical analysis, S(n) = ts/tp where o ts is the worst case running time for of the fastest known sequential algorithm for the problem o tp is the worst case running time of the parallel algorithm using n PEs. 57
  • 59. Amdahl’s law: “It states that the potential speedup gained by the parallel execution of a program is limited by the portion that can be parallelized.” 59
  • 60. Amdahl’s law • execution time before is 1 for some unit of time 60
  • 61. Question: • When parallelizing an application, the ideal speedup is speeding up by the number of processors. What is the speedup with 8 processors if 60% of the application is parallelizable? 61
  • 62. Question: • When parallelizing an application, the ideal speedup is speeding up by the number of processors. What is the speedup with 8 processors if 80% of the application is parallelizable? 62
  • 63. QUESTION: • Suppose that we are considering an enhancement that runs 10 times faster than the original machine but is usable only 40% of the time. What is the overall speedup gained by incorporating the enhancement.? 63
  • 64. Question • Suppose you want to achieve a speed-up of 90 times faster with 100 processors. What percentage of the original computation can be sequential? 64
  • 65. Question • Suppose you want to achieve a speed-up of 90 times faster with 100 processors. What percentage of the original computation can be sequential? 65
  • 66. Question • Suppose you want to perform two sums: one is a sum of 10 scalar variables, and one is a matrix sum of a pair of two- dimensional arrays, with dimensions 10 by 10. For now let’s assume only the matrix sum is parallelizable. What speed-up do you get with 10 versus 40 processors? • Next, calculate the speed-ups assuming the matrices grow to 20 by 20. 66
  • 68. Graphics processing unit (GPU) • It is a processor optimized for 2D/3D graphics, video, visual computing, and display. • It is highly parallel, highly multithreaded multiprocessor optimized for visual computing. • It provide real-time visual interaction with computed objects via graphics images, and video. • Heterogeneous Systems: combine a GPU with a CPU 68
  • 70. 70
  • 71. An Introduction to the NVIDIA GPU Architecture 71
  • 72. NVIDIA GPU Memory Structures 72