SlideShare a Scribd company logo
Multithreading
Mr. A. B. Shinde
Assistant Professor,
Electronics Engineering,
P.V.P.I.T., Budhgaon
Contents…
 Using ILP support to exploit
thread –level parallelism
 performance and efficiency in
advanced multiple issue
processors
2
Threads
 A thread is a basic unit of CPU utilization.
 A thread is a separate process with its own instructions and data.
 A thread may represent a process that is part of a parallel program
consisting of multiple processes, or it may represent an
independent program.
3
Threads
 It comprises of a thread ID, a program counter, a register set and a
stack.
 It shares its code section, data section, and other operating-system
resources, such as open files and signals with other threads
belonging to the same Process.
 A traditional process has a single thread of control. If a process has
multiple threads of control, it can perform more than one task at a time.
4
Threads
 Many software packages that run
on modern desktop PCs are
multithreaded.
 For example:
A word processor may have:
a thread for displaying graphics,
another thread for responding to
keystrokes from the user, and
a third thread for performing spelling
and grammar checking in the
background.
5
Threads
 Threads also play a vital role in remote procedure call (RPC)
systems.
 RPCs allows interprocess communication by providing a
communication mechanism similar to ordinary function or procedure
calls.
 Many operating system kernels are multithreaded; several threads
operate in the kernel, and each thread performs a specific task, such as
managing devices or interrupt handling.
6
Multithreading
 Benefits:
1. Responsiveness: Multithreading is an interactive application that
may allow a program to continue running even if part of it is
blocked, thereby increasing responsiveness to the user.
For example: A multithreaded web browser could still allow user
interaction in one thread while an image was being loaded in another
thread.
2. Resource sharing: By default, threads share the memory and the
resources of the process to which they belong. The benefit of sharing
code and data is that it allows an application to have several different
threads of activity within the same address space.
7
Multithreading
 Benefits:
3. Economy: Allocating memory and resources for process creation is
costly. Since threads share resources of the process to which they
belong, they will provide cost effective solution.
4. Utilization of multiprocessor architectures: In multiprocessor
architecture, threads may be running in parallel on different processors.
A single threaded process can only run on one CPU, no matter how
many are available.
Multithreading on a multi-CPU machine increases concurrency.
8
Multithreading Models
 Support for threads may be provided either at the user level or at
the kernel level.
 User threads are supported above the kernel and are managed
without kernel support, whereas kernel threads are supported and
managed directly by the operating system.
9
Multithreading Models
 Many-to-One Model:
 The many-to-one model maps many user-
level threads to one kernel thread.
 Thread management is done by the
thread library in user space, so it is
efficient.
 Only one thread can access the kernel at
a time, hence multiple threads are unable to
run in parallel on multiprocessors.
10
Multithreading Models
 One-to-One Model:
 The one-to-one model maps each user
thread to a kernel thread.
 It provides more concurrency than the many-
to-one model. It allows multiple threads to run in
parallel on multiprocessors.
 The only drawback to this model is that
creating a user thread requires creating the
corresponding kernel thread.
 The overhead of creating kernel threads can
burden the performance of an application.
11
Multithreading Models
 Many-to-Many Model :
 The many-to-many model multiplexes many
user-level threads to a smaller or equal
number of kernel threads.
 The number of kernel threads may be specific
to either a particular application or a particular
machine.
 Developers can create as many user threads
as necessary, and the corresponding kernel
threads can run in parallel on a
multiprocessor.
12
Multithreading: ILP Support to Exploit
Thread-Level Parallelism
13
Multithreading: ILP Support to Exploit
Thread-Level Parallelism
 Although ILP increases the performance of system; then also ILP
can be quite limited or hard to exploit in some applications.
Furthermore, there may be parallelism occurring naturally at a higher
level in the application.
For example:
An online transaction-processing system has parallelism among the
multiple queries and updates. These queries and updates can be
processed mostly in parallel, since they are largely independent of one
another.
14
Multithreading: ILP Support to Exploit
Thread-Level Parallelism
 This higher-level parallelism is called thread-level parallelism (TLP)
because it is logically structured as separate threads of execution.
 ILP is parallel operations within a loop or straight-line code.
 TLP is represented by the use of multiple threads of execution that
are in parallel.
15
Multithreading: ILP Support to Exploit
Thread-Level Parallelism
 Thread-level parallelism is an important alternative to instruction-
level parallelism.
 In many applications thread-level parallelism occurs naturally (many
server applications).
 If software is written from scratch, then expressing the parallelism
is much easy.
 But if established applications written without parallelism in mind,
then there can be significant challenges and can be extremely costly
to rewrite them to exploit thread-level parallelism.
16
Multithreading: ILP Support to Exploit
Thread-Level Parallelism
 TLP and ILP exploits two different kinds of parallel structures.
 The crucial question is:
Can we exploit TLP on processor designed for ILP
 Answer is: Yes
Datapath designed to exploit ILP will find that many functional units are
often idle because of either stalls or dependences in the code.
The threads can be used as a independent instructions that might keep
the processor busy to implement TLP.
17
Multithreading: ILP Support to Exploit
Thread-Level Parallelism
 Multithreading allows multiple threads to share the functional units
of a single processor in an overlapping fashion.
 To permit this sharing, the processor must duplicate the
independent state of each thread.
 For example:
A separate copy of the register file, a separate PC and a separate page
table were required for each thread.
 In addition, the hardware must support the ability to change to a
different threads relatively quickly.
18
Multithreading: ILP Support to Exploit
Thread-Level Parallelism
 There are two main approaches to multithreading.
 Fine-grained multithreading &
 Coarse-grained multithreading
19
Multithreading: ILP Support to Exploit
Thread-Level Parallelism
 Fine-grained multithreading:
 It switches between threads on each instruction, causing the
execution of multiple threads to be interleaved.
 This interleaving is often done in a round-robin fashion.
 To make fine-grained multithreading practical, the CPU must be
able to switch threads on every clock cycle.
 Advantage: It can hide the throughput losses that arise from both short
and long stalls.
 Disadvantage: It slows down the execution of the individual threads.
20
Multithreading: ILP Support to Exploit
Thread-Level Parallelism
 Coarse-grained multithreading:
 It was invented as an alternative to fine-grained multithreading.
 Coarse-grained multithreading switches threads only on costly
(larger) stalls.
 Advantage: This change relieves the need to have thread switching.
 Disadvantage: They are likely to slow the processor down, since
instructions from other threads will only be issued when a thread
encounters a costly (larger) stalls.
21
Multithreading: ILP Support to Exploit
Thread-Level Parallelism
 CPU with coarse-grained multithreading issues instructions from a
single thread.
 When a stall occurs, the pipeline must be emptied or frozen.
 New thread that is executing after the stall must fill the pipeline.
 Because of this start-up overhead, coarse grained multithreading is
much more useful for reducing the penalty of high-cost stalls,
where pipeline refill is negligible compared to the stall time.
22
Converting Thread-Level
Parallelism into Instruction-Level Parallelism
 Simultaneous multithreading (SMT) is a variation on multithreading
that uses the resources of a multiple-issue, dynamically scheduled
processor to exploit TLP.
 Multiple-issue processors often have more functional unit
parallelism than a single thread, motivates the use of SMT.
 With register renaming and dynamic scheduling, multiple instructions
from independent threads can be issued without considering the
dependences among them.
23
Converting Thread-Level
Parallelism into Instruction-Level Parallelism
Figure illustrates the differences in a processor’s ability to exploit the
resources of a superscalar for the following configurations:
 A superscalar with no multithreading support
 A superscalar with coarse-grained multithreading
 A superscalar with fine-grained multithreading
 A superscalar with simultaneous multithreading
24
Converting Thread-Level
Parallelism into Instruction-Level Parallelism
 In the superscalar without multithreading support,
the use of issue slots is limited by a lack of ILP.
 In addition, a major stall, such as an instruction
cache miss, can leave the entire processor idle.
25
An empty (white) box indicates that the
corresponding issue slot is unused in that clock
cycle.
Black is used to indicate the occupied issue slots
Converting Thread-Level
Parallelism into Instruction-Level Parallelism
 In the coarse-grained multithreaded superscalar,
the long stalls are partially hidden by switching
to another thread that uses the resources of the
processor.
 This reduces the number of completely idle
clock cycles, within each clock cycle, the ILP
limitations still lead to idle cycles.
 In a coarse grained multithreaded processor,
thread switching only occurs when there is a
stall, then also there will be some fully idle cycles
remaining.
26
The shades of grey and black correspond to
different threads in the multithreading processors.
Converting Thread-Level
Parallelism into Instruction-Level Parallelism
 In the fine-grained multithreading, the
interleaving of threads eliminates fully empty
slots.
 Because only one thread issues instructions in
a given clock cycle, ILP limitations still lead to a
significant number of idle slots within individual
clock cycles.
27
An empty (white) box indicates that the
corresponding issue slot is unused in that clock
cycle.
The shades of grey and black correspond to four
different threads in the multithreading processors.
Converting Thread-Level
Parallelism into Instruction-Level Parallelism
 In SMT, TLP and ILP are exploited
simultaneously.
 Ideally, the issue slot usage is limited by
imbalances in the resource needs and resource
availability over multiple threads.
 In practice, other factors —
- how many active threads are considered,
- finite limitations on buffers,
- the ability to fetch enough instructions from
multiple threads, and
- practical limitations of what instruction
combinations can issue from one thread and
from multiple threads—can also restrict how
many slots are used.
28
Converting Thread-Level
Parallelism into Instruction-Level Parallelism
 Design Challenges in SMT
 Because a dynamically scheduled superscalar processor has a
deep pipeline, coarse-grained MT will gain much in performance.
 Since SMT makes sense only in a fine-grained implementation, we
should think about the impact of fine-grained scheduling on single-
thread performance.
 This effect can be minimized by having a preferred thread, which
still permits multithreading to preserve some of its performance
advantage with a smaller compromise in single-thread performance.
29
Converting Thread-Level
Parallelism into Instruction-Level Parallelism
 Design Challenges in SMT
 Other design challenges for an SMT processor:
 Dealing with a larger register file needed to hold multiple contexts.
 Not affecting the clock cycle, particularly in instruction issue, where
more instructions needs to be considered, and choosing what
instructions to commit may be challenging.
 Ensuring that the cache and TLB conflicts generated by the
simultaneous execution of multiple threads do not cause significant
performance degradation is also challenging.
30
Converting Thread-Level
Parallelism into Instruction-Level Parallelism
 Design Challenges in SMT
 In many cases, the potential performance overhead due to
multithreading is small.
 The efficiency of current superscalars is low enough that there is
scope for significant improvement, even at the cost of some overhead.
31
Performance and Efficiency in Advanced
Multiple-Issue Processors
32
Performance and Efficiency in Advanced
Multiple-Issue Processors
 The question of efficiency in terms of silicon area and power is
equally critical.
 Power is the major constraint on modern processors.
 The Itanium 2 is the most inefficient processor both for floating-point
and integer code.
 The Athlon and Pentium 4 both makes good use of transistor and
area in terms of efficiency.
 The IBM Power5 is the most effective user of energy.
 The fact that none of the processors offer an great advantage in
efficiency.
33
Performance and Efficiency in Advanced
Multiple-Issue Processors
 What Limits Multiple-Issue Processors?
 Power is a function of both static power (proportional to the transistor
count, whether or not the transistors are switching), and dynamic
power (proportional to the product of the number of switching
transistors and the switching rate).
 Static power is certainly a design concern, and dynamic power is
usually the dominant energy consumer.
 A microprocessor trying to achieve both a low CPI and a high CR
must switch more transistors and switch them faster.
34
Performance and Efficiency in Advanced
Multiple-Issue Processors
 What Limits Multiple-Issue Processors?
 Most techniques used for increasing performance, (multiple cores
and multithreading) will increase power consumption.
 The key question is whether a technique is energy efficient?
Does it increase power consumption faster than it increases
performance?
35
Performance and Efficiency in Advanced
Multiple-Issue Processors
 What Limits Multiple-Issue Processors?
 This inefficiency, arises from two primary characteristics:
 First, issuing multiple instructions incurs some overhead in logic
that grows faster than the issue rate grows.
 This logic is responsible for instruction issue analysis, including
dependence checking, register renaming, and similar functions.
 The combined result is that, lower CPIs are likely to lead to lower
ratios of performance per watt, simply due to overhead.
36
Performance and Efficiency in Advanced
Multiple-Issue Processors
 What Limits Multiple-Issue Processors?
 Second, the growing gap between peak issue rates and sustained
performance.
 The number of transistors switching will be proportional to the
peak issue rate, and the performance is proportional to the
sustained rate.
 For example: If we want to sustain four instructions per clock, we must
fetch more, issue more, and initiate execution on more than four
instructions.
 The power will be proportional to the peak rate, but performance
will be at the sustained rate.
37
Performance and Efficiency in Advanced
Multiple-Issue Processors
 What Limits Multiple-Issue Processors?
 Important technique for increasing the exploitation of ILP (speculation)
— is inefficient… because it can never be perfect.
 If speculation were perfect, it could save power, since it would
reduce the execution time and save static power.
 When speculation is not perfect, it rapidly becomes energy
inefficient, since it requires additional dynamic power.
38
Performance and Efficiency in Advanced
Multiple-Issue Processors
 What Limits Multiple-Issue Processors?
 Focusing on improving clock rate:
 Increasing the clock rate will increase transistor switching
frequency and directly increase power consumption.
 To achieve a faster clock rate, we would need to increase pipeline
depth.
 Deeper pipelines, incur additional overhead penalties as well as
causing higher switching rates.
39
40
This presentation is published only for educational purpose
shindesir.pvp@gmail.com

More Related Content

What's hot

INTER PROCESS COMMUNICATION (IPC).pptx
INTER PROCESS COMMUNICATION (IPC).pptxINTER PROCESS COMMUNICATION (IPC).pptx
INTER PROCESS COMMUNICATION (IPC).pptx
LECO9
 
Thread scheduling in Operating Systems
Thread scheduling in Operating SystemsThread scheduling in Operating Systems
Thread scheduling in Operating SystemsNitish Gulati
 
Parallel processing
Parallel processingParallel processing
Parallel processing
rajshreemuthiah
 
8 memory management strategies
8 memory management strategies8 memory management strategies
8 memory management strategies
Dr. Loganathan R
 
System calls
System callsSystem calls
System calls
Bernard Senam
 
Cache memory
Cache memoryCache memory
Cache memoryAnuj Modi
 
Multithreading models.ppt
Multithreading models.pptMultithreading models.ppt
Multithreading models.ppt
Luis Goldster
 
Memory Management in OS
Memory Management in OSMemory Management in OS
Memory Management in OS
Kumar Pritam
 
Multithreading
Multithreading Multithreading
Multithreading
WafaQKhan
 
Ch1-Operating System Concept
Ch1-Operating System ConceptCh1-Operating System Concept
Ch1-Operating System Concept
Muhammad Bilal Tariq
 
CPU Scheduling in OS Presentation
CPU Scheduling in OS  PresentationCPU Scheduling in OS  Presentation
CPU Scheduling in OS Presentation
usmankiyani1
 
DeadLock in Operating-Systems
DeadLock in Operating-SystemsDeadLock in Operating-Systems
DeadLock in Operating-Systems
Venkata Sreeram
 
Demand paging
Demand pagingDemand paging
Demand paging
Trinity Dwarka
 
Pipelining and vector processing
Pipelining and vector processingPipelining and vector processing
Pipelining and vector processing
Kamal Acharya
 
Context switching
Context switchingContext switching
Context switching
DarakhshanNayyab
 
Process of operating system
Process of operating systemProcess of operating system
Cache coherence ppt
Cache coherence pptCache coherence ppt
Cache coherence ppt
ArendraSingh2
 
Threads .ppt
Threads .pptThreads .ppt
Threads .ppt
meet darji
 
Thrashing allocation frames.43
Thrashing allocation frames.43Thrashing allocation frames.43
Thrashing allocation frames.43myrajendra
 

What's hot (20)

INTER PROCESS COMMUNICATION (IPC).pptx
INTER PROCESS COMMUNICATION (IPC).pptxINTER PROCESS COMMUNICATION (IPC).pptx
INTER PROCESS COMMUNICATION (IPC).pptx
 
Thread scheduling in Operating Systems
Thread scheduling in Operating SystemsThread scheduling in Operating Systems
Thread scheduling in Operating Systems
 
Parallel processing
Parallel processingParallel processing
Parallel processing
 
8 memory management strategies
8 memory management strategies8 memory management strategies
8 memory management strategies
 
System calls
System callsSystem calls
System calls
 
Semaphores
SemaphoresSemaphores
Semaphores
 
Cache memory
Cache memoryCache memory
Cache memory
 
Multithreading models.ppt
Multithreading models.pptMultithreading models.ppt
Multithreading models.ppt
 
Memory Management in OS
Memory Management in OSMemory Management in OS
Memory Management in OS
 
Multithreading
Multithreading Multithreading
Multithreading
 
Ch1-Operating System Concept
Ch1-Operating System ConceptCh1-Operating System Concept
Ch1-Operating System Concept
 
CPU Scheduling in OS Presentation
CPU Scheduling in OS  PresentationCPU Scheduling in OS  Presentation
CPU Scheduling in OS Presentation
 
DeadLock in Operating-Systems
DeadLock in Operating-SystemsDeadLock in Operating-Systems
DeadLock in Operating-Systems
 
Demand paging
Demand pagingDemand paging
Demand paging
 
Pipelining and vector processing
Pipelining and vector processingPipelining and vector processing
Pipelining and vector processing
 
Context switching
Context switchingContext switching
Context switching
 
Process of operating system
Process of operating systemProcess of operating system
Process of operating system
 
Cache coherence ppt
Cache coherence pptCache coherence ppt
Cache coherence ppt
 
Threads .ppt
Threads .pptThreads .ppt
Threads .ppt
 
Thrashing allocation frames.43
Thrashing allocation frames.43Thrashing allocation frames.43
Thrashing allocation frames.43
 

Viewers also liked

Java And Multithreading
Java And MultithreadingJava And Multithreading
Java And Multithreading
Shraddha
 
Threads in JAVA
Threads in JAVAThreads in JAVA
Multithreading In Java
Multithreading In JavaMultithreading In Java
Multithreading In Javaparag
 
Chap2 2 1
Chap2 2 1Chap2 2 1
Chap2 2 1
Hemo Chella
 
Threads concept in java
Threads concept in javaThreads concept in java
Threads concept in java
Muthukumaran Subramanian
 
Multithreading in java
Multithreading in javaMultithreading in java
Multithreading in javaRaghu nath
 
Java multi threading
Java multi threadingJava multi threading
Java multi threading
Raja Sekhar
 

Viewers also liked (8)

Java And Multithreading
Java And MultithreadingJava And Multithreading
Java And Multithreading
 
Threads in JAVA
Threads in JAVAThreads in JAVA
Threads in JAVA
 
Multithreading In Java
Multithreading In JavaMultithreading In Java
Multithreading In Java
 
Chap2 2 1
Chap2 2 1Chap2 2 1
Chap2 2 1
 
Multithreading Concepts
Multithreading ConceptsMultithreading Concepts
Multithreading Concepts
 
Threads concept in java
Threads concept in javaThreads concept in java
Threads concept in java
 
Multithreading in java
Multithreading in javaMultithreading in java
Multithreading in java
 
Java multi threading
Java multi threadingJava multi threading
Java multi threading
 

Similar to Multithreading

What is simultaneous multithreading
What is simultaneous multithreadingWhat is simultaneous multithreading
What is simultaneous multithreading
Fraboni Ec
 
Topic 4- processes.pptx
Topic 4- processes.pptxTopic 4- processes.pptx
Topic 4- processes.pptx
DanishMahmood23
 
Multithreading: Exploiting Thread-Level Parallelism to Improve Uniprocessor ...
Multithreading: Exploiting Thread-Level  Parallelism to Improve Uniprocessor ...Multithreading: Exploiting Thread-Level  Parallelism to Improve Uniprocessor ...
Multithreading: Exploiting Thread-Level Parallelism to Improve Uniprocessor ...
Ahmed kasim
 
1. What important part of the process switch operation is not shown .pdf
1. What important part of the process switch operation is not shown .pdf1. What important part of the process switch operation is not shown .pdf
1. What important part of the process switch operation is not shown .pdf
fathimaoptical
 
Multicore Computers
Multicore ComputersMulticore Computers
Multicore Computers
A B Shinde
 
thread_ multiprocessor_ scheduling_a.ppt
thread_ multiprocessor_ scheduling_a.pptthread_ multiprocessor_ scheduling_a.ppt
thread_ multiprocessor_ scheduling_a.ppt
naghamallella
 
Parallel and Distributed Computing chapter 3
Parallel and Distributed Computing chapter 3Parallel and Distributed Computing chapter 3
Parallel and Distributed Computing chapter 3
AbdullahMunir32
 
Thread
ThreadThread
Assignment-01.pptx
Assignment-01.pptxAssignment-01.pptx
Assignment-01.pptx
HaiderZaman45
 
Multi threaded programming
Multi threaded programmingMulti threaded programming
Multi threaded programming
AnyapuPranav
 
Threads
ThreadsThreads
Threads
atikkazimca
 
OS Module-2.pptx
OS Module-2.pptxOS Module-2.pptx
OS Module-2.pptx
bleh23
 
Ef35745749
Ef35745749Ef35745749
Ef35745749
IJERA Editor
 
W-9.pptx
W-9.pptxW-9.pptx
W-9.pptx
alianwarr
 
Thread (Operating System)
Thread  (Operating System)Thread  (Operating System)
Thread (Operating System)
kiran Patel
 
Operating Systems - "Chapter 4: Multithreaded Programming"
Operating Systems - "Chapter 4:  Multithreaded Programming"Operating Systems - "Chapter 4:  Multithreaded Programming"
Operating Systems - "Chapter 4: Multithreaded Programming"
Ra'Fat Al-Msie'deen
 
Threads
ThreadsThreads
Study of various factors affecting performance of multi core processors
Study of various factors affecting performance of multi core processorsStudy of various factors affecting performance of multi core processors
Study of various factors affecting performance of multi core processors
ateeq ateeq
 

Similar to Multithreading (20)

Threads
ThreadsThreads
Threads
 
What is simultaneous multithreading
What is simultaneous multithreadingWhat is simultaneous multithreading
What is simultaneous multithreading
 
Wiki 2
Wiki 2Wiki 2
Wiki 2
 
Topic 4- processes.pptx
Topic 4- processes.pptxTopic 4- processes.pptx
Topic 4- processes.pptx
 
Multithreading: Exploiting Thread-Level Parallelism to Improve Uniprocessor ...
Multithreading: Exploiting Thread-Level  Parallelism to Improve Uniprocessor ...Multithreading: Exploiting Thread-Level  Parallelism to Improve Uniprocessor ...
Multithreading: Exploiting Thread-Level Parallelism to Improve Uniprocessor ...
 
1. What important part of the process switch operation is not shown .pdf
1. What important part of the process switch operation is not shown .pdf1. What important part of the process switch operation is not shown .pdf
1. What important part of the process switch operation is not shown .pdf
 
Multicore Computers
Multicore ComputersMulticore Computers
Multicore Computers
 
thread_ multiprocessor_ scheduling_a.ppt
thread_ multiprocessor_ scheduling_a.pptthread_ multiprocessor_ scheduling_a.ppt
thread_ multiprocessor_ scheduling_a.ppt
 
Parallel and Distributed Computing chapter 3
Parallel and Distributed Computing chapter 3Parallel and Distributed Computing chapter 3
Parallel and Distributed Computing chapter 3
 
Thread
ThreadThread
Thread
 
Assignment-01.pptx
Assignment-01.pptxAssignment-01.pptx
Assignment-01.pptx
 
Multi threaded programming
Multi threaded programmingMulti threaded programming
Multi threaded programming
 
Threads
ThreadsThreads
Threads
 
OS Module-2.pptx
OS Module-2.pptxOS Module-2.pptx
OS Module-2.pptx
 
Ef35745749
Ef35745749Ef35745749
Ef35745749
 
W-9.pptx
W-9.pptxW-9.pptx
W-9.pptx
 
Thread (Operating System)
Thread  (Operating System)Thread  (Operating System)
Thread (Operating System)
 
Operating Systems - "Chapter 4: Multithreaded Programming"
Operating Systems - "Chapter 4:  Multithreaded Programming"Operating Systems - "Chapter 4:  Multithreaded Programming"
Operating Systems - "Chapter 4: Multithreaded Programming"
 
Threads
ThreadsThreads
Threads
 
Study of various factors affecting performance of multi core processors
Study of various factors affecting performance of multi core processorsStudy of various factors affecting performance of multi core processors
Study of various factors affecting performance of multi core processors
 

More from A B Shinde

Communication System Basics
Communication System BasicsCommunication System Basics
Communication System Basics
A B Shinde
 
MOSFETs: Single Stage IC Amplifier
MOSFETs: Single Stage IC AmplifierMOSFETs: Single Stage IC Amplifier
MOSFETs: Single Stage IC Amplifier
A B Shinde
 
MOSFETs
MOSFETsMOSFETs
MOSFETs
A B Shinde
 
Color Image Processing: Basics
Color Image Processing: BasicsColor Image Processing: Basics
Color Image Processing: Basics
A B Shinde
 
Edge Detection and Segmentation
Edge Detection and SegmentationEdge Detection and Segmentation
Edge Detection and Segmentation
A B Shinde
 
Image Processing: Spatial filters
Image Processing: Spatial filtersImage Processing: Spatial filters
Image Processing: Spatial filters
A B Shinde
 
Image Enhancement in Spatial Domain
Image Enhancement in Spatial DomainImage Enhancement in Spatial Domain
Image Enhancement in Spatial Domain
A B Shinde
 
Resume Format
Resume FormatResume Format
Resume Format
A B Shinde
 
Digital Image Fundamentals
Digital Image FundamentalsDigital Image Fundamentals
Digital Image Fundamentals
A B Shinde
 
Resume Writing
Resume WritingResume Writing
Resume Writing
A B Shinde
 
Image Processing Basics
Image Processing BasicsImage Processing Basics
Image Processing Basics
A B Shinde
 
Blooms Taxonomy in Engineering Education
Blooms Taxonomy in Engineering EducationBlooms Taxonomy in Engineering Education
Blooms Taxonomy in Engineering Education
A B Shinde
 
ISE 7.1i Software
ISE 7.1i SoftwareISE 7.1i Software
ISE 7.1i Software
A B Shinde
 
VHDL Coding Syntax
VHDL Coding SyntaxVHDL Coding Syntax
VHDL Coding Syntax
A B Shinde
 
VHDL Programs
VHDL ProgramsVHDL Programs
VHDL Programs
A B Shinde
 
VLSI Testing Techniques
VLSI Testing TechniquesVLSI Testing Techniques
VLSI Testing Techniques
A B Shinde
 
Selecting Engineering Project
Selecting Engineering ProjectSelecting Engineering Project
Selecting Engineering Project
A B Shinde
 
Interview Techniques
Interview TechniquesInterview Techniques
Interview Techniques
A B Shinde
 
Semiconductors
SemiconductorsSemiconductors
Semiconductors
A B Shinde
 
Diode Applications & Transistor Basics
Diode Applications & Transistor BasicsDiode Applications & Transistor Basics
Diode Applications & Transistor Basics
A B Shinde
 

More from A B Shinde (20)

Communication System Basics
Communication System BasicsCommunication System Basics
Communication System Basics
 
MOSFETs: Single Stage IC Amplifier
MOSFETs: Single Stage IC AmplifierMOSFETs: Single Stage IC Amplifier
MOSFETs: Single Stage IC Amplifier
 
MOSFETs
MOSFETsMOSFETs
MOSFETs
 
Color Image Processing: Basics
Color Image Processing: BasicsColor Image Processing: Basics
Color Image Processing: Basics
 
Edge Detection and Segmentation
Edge Detection and SegmentationEdge Detection and Segmentation
Edge Detection and Segmentation
 
Image Processing: Spatial filters
Image Processing: Spatial filtersImage Processing: Spatial filters
Image Processing: Spatial filters
 
Image Enhancement in Spatial Domain
Image Enhancement in Spatial DomainImage Enhancement in Spatial Domain
Image Enhancement in Spatial Domain
 
Resume Format
Resume FormatResume Format
Resume Format
 
Digital Image Fundamentals
Digital Image FundamentalsDigital Image Fundamentals
Digital Image Fundamentals
 
Resume Writing
Resume WritingResume Writing
Resume Writing
 
Image Processing Basics
Image Processing BasicsImage Processing Basics
Image Processing Basics
 
Blooms Taxonomy in Engineering Education
Blooms Taxonomy in Engineering EducationBlooms Taxonomy in Engineering Education
Blooms Taxonomy in Engineering Education
 
ISE 7.1i Software
ISE 7.1i SoftwareISE 7.1i Software
ISE 7.1i Software
 
VHDL Coding Syntax
VHDL Coding SyntaxVHDL Coding Syntax
VHDL Coding Syntax
 
VHDL Programs
VHDL ProgramsVHDL Programs
VHDL Programs
 
VLSI Testing Techniques
VLSI Testing TechniquesVLSI Testing Techniques
VLSI Testing Techniques
 
Selecting Engineering Project
Selecting Engineering ProjectSelecting Engineering Project
Selecting Engineering Project
 
Interview Techniques
Interview TechniquesInterview Techniques
Interview Techniques
 
Semiconductors
SemiconductorsSemiconductors
Semiconductors
 
Diode Applications & Transistor Basics
Diode Applications & Transistor BasicsDiode Applications & Transistor Basics
Diode Applications & Transistor Basics
 

Recently uploaded

Automobile Management System Project Report.pdf
Automobile Management System Project Report.pdfAutomobile Management System Project Report.pdf
Automobile Management System Project Report.pdf
Kamal Acharya
 
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
bakpo1
 
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
AJAYKUMARPUND1
 
COLLEGE BUS MANAGEMENT SYSTEM PROJECT REPORT.pdf
COLLEGE BUS MANAGEMENT SYSTEM PROJECT REPORT.pdfCOLLEGE BUS MANAGEMENT SYSTEM PROJECT REPORT.pdf
COLLEGE BUS MANAGEMENT SYSTEM PROJECT REPORT.pdf
Kamal Acharya
 
ethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.pptethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.ppt
Jayaprasanna4
 
Cosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdfCosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdf
Kamal Acharya
 
road safety engineering r s e unit 3.pdf
road safety engineering  r s e unit 3.pdfroad safety engineering  r s e unit 3.pdf
road safety engineering r s e unit 3.pdf
VENKATESHvenky89705
 
The role of big data in decision making.
The role of big data in decision making.The role of big data in decision making.
The role of big data in decision making.
ankuprajapati0525
 
LIGA(E)11111111111111111111111111111111111111111.ppt
LIGA(E)11111111111111111111111111111111111111111.pptLIGA(E)11111111111111111111111111111111111111111.ppt
LIGA(E)11111111111111111111111111111111111111111.ppt
ssuser9bd3ba
 
Quality defects in TMT Bars, Possible causes and Potential Solutions.
Quality defects in TMT Bars, Possible causes and Potential Solutions.Quality defects in TMT Bars, Possible causes and Potential Solutions.
Quality defects in TMT Bars, Possible causes and Potential Solutions.
PrashantGoswami42
 
J.Yang, ICLR 2024, MLILAB, KAIST AI.pdf
J.Yang,  ICLR 2024, MLILAB, KAIST AI.pdfJ.Yang,  ICLR 2024, MLILAB, KAIST AI.pdf
J.Yang, ICLR 2024, MLILAB, KAIST AI.pdf
MLILAB
 
Gen AI Study Jams _ For the GDSC Leads in India.pdf
Gen AI Study Jams _ For the GDSC Leads in India.pdfGen AI Study Jams _ For the GDSC Leads in India.pdf
Gen AI Study Jams _ For the GDSC Leads in India.pdf
gdsczhcet
 
CME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional ElectiveCME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional Elective
karthi keyan
 
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Dr.Costas Sachpazis
 
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdfTop 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Teleport Manpower Consultant
 
Railway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdfRailway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdf
TeeVichai
 
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
MdTanvirMahtab2
 
Vaccine management system project report documentation..pdf
Vaccine management system project report documentation..pdfVaccine management system project report documentation..pdf
Vaccine management system project report documentation..pdf
Kamal Acharya
 
Standard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - NeometrixStandard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - Neometrix
Neometrix_Engineering_Pvt_Ltd
 
addressing modes in computer architecture
addressing modes  in computer architectureaddressing modes  in computer architecture
addressing modes in computer architecture
ShahidSultan24
 

Recently uploaded (20)

Automobile Management System Project Report.pdf
Automobile Management System Project Report.pdfAutomobile Management System Project Report.pdf
Automobile Management System Project Report.pdf
 
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
 
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
 
COLLEGE BUS MANAGEMENT SYSTEM PROJECT REPORT.pdf
COLLEGE BUS MANAGEMENT SYSTEM PROJECT REPORT.pdfCOLLEGE BUS MANAGEMENT SYSTEM PROJECT REPORT.pdf
COLLEGE BUS MANAGEMENT SYSTEM PROJECT REPORT.pdf
 
ethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.pptethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.ppt
 
Cosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdfCosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdf
 
road safety engineering r s e unit 3.pdf
road safety engineering  r s e unit 3.pdfroad safety engineering  r s e unit 3.pdf
road safety engineering r s e unit 3.pdf
 
The role of big data in decision making.
The role of big data in decision making.The role of big data in decision making.
The role of big data in decision making.
 
LIGA(E)11111111111111111111111111111111111111111.ppt
LIGA(E)11111111111111111111111111111111111111111.pptLIGA(E)11111111111111111111111111111111111111111.ppt
LIGA(E)11111111111111111111111111111111111111111.ppt
 
Quality defects in TMT Bars, Possible causes and Potential Solutions.
Quality defects in TMT Bars, Possible causes and Potential Solutions.Quality defects in TMT Bars, Possible causes and Potential Solutions.
Quality defects in TMT Bars, Possible causes and Potential Solutions.
 
J.Yang, ICLR 2024, MLILAB, KAIST AI.pdf
J.Yang,  ICLR 2024, MLILAB, KAIST AI.pdfJ.Yang,  ICLR 2024, MLILAB, KAIST AI.pdf
J.Yang, ICLR 2024, MLILAB, KAIST AI.pdf
 
Gen AI Study Jams _ For the GDSC Leads in India.pdf
Gen AI Study Jams _ For the GDSC Leads in India.pdfGen AI Study Jams _ For the GDSC Leads in India.pdf
Gen AI Study Jams _ For the GDSC Leads in India.pdf
 
CME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional ElectiveCME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional Elective
 
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
 
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdfTop 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
 
Railway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdfRailway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdf
 
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
 
Vaccine management system project report documentation..pdf
Vaccine management system project report documentation..pdfVaccine management system project report documentation..pdf
Vaccine management system project report documentation..pdf
 
Standard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - NeometrixStandard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - Neometrix
 
addressing modes in computer architecture
addressing modes  in computer architectureaddressing modes  in computer architecture
addressing modes in computer architecture
 

Multithreading

  • 1. Multithreading Mr. A. B. Shinde Assistant Professor, Electronics Engineering, P.V.P.I.T., Budhgaon
  • 2. Contents…  Using ILP support to exploit thread –level parallelism  performance and efficiency in advanced multiple issue processors 2
  • 3. Threads  A thread is a basic unit of CPU utilization.  A thread is a separate process with its own instructions and data.  A thread may represent a process that is part of a parallel program consisting of multiple processes, or it may represent an independent program. 3
  • 4. Threads  It comprises of a thread ID, a program counter, a register set and a stack.  It shares its code section, data section, and other operating-system resources, such as open files and signals with other threads belonging to the same Process.  A traditional process has a single thread of control. If a process has multiple threads of control, it can perform more than one task at a time. 4
  • 5. Threads  Many software packages that run on modern desktop PCs are multithreaded.  For example: A word processor may have: a thread for displaying graphics, another thread for responding to keystrokes from the user, and a third thread for performing spelling and grammar checking in the background. 5
  • 6. Threads  Threads also play a vital role in remote procedure call (RPC) systems.  RPCs allows interprocess communication by providing a communication mechanism similar to ordinary function or procedure calls.  Many operating system kernels are multithreaded; several threads operate in the kernel, and each thread performs a specific task, such as managing devices or interrupt handling. 6
  • 7. Multithreading  Benefits: 1. Responsiveness: Multithreading is an interactive application that may allow a program to continue running even if part of it is blocked, thereby increasing responsiveness to the user. For example: A multithreaded web browser could still allow user interaction in one thread while an image was being loaded in another thread. 2. Resource sharing: By default, threads share the memory and the resources of the process to which they belong. The benefit of sharing code and data is that it allows an application to have several different threads of activity within the same address space. 7
  • 8. Multithreading  Benefits: 3. Economy: Allocating memory and resources for process creation is costly. Since threads share resources of the process to which they belong, they will provide cost effective solution. 4. Utilization of multiprocessor architectures: In multiprocessor architecture, threads may be running in parallel on different processors. A single threaded process can only run on one CPU, no matter how many are available. Multithreading on a multi-CPU machine increases concurrency. 8
  • 9. Multithreading Models  Support for threads may be provided either at the user level or at the kernel level.  User threads are supported above the kernel and are managed without kernel support, whereas kernel threads are supported and managed directly by the operating system. 9
  • 10. Multithreading Models  Many-to-One Model:  The many-to-one model maps many user- level threads to one kernel thread.  Thread management is done by the thread library in user space, so it is efficient.  Only one thread can access the kernel at a time, hence multiple threads are unable to run in parallel on multiprocessors. 10
  • 11. Multithreading Models  One-to-One Model:  The one-to-one model maps each user thread to a kernel thread.  It provides more concurrency than the many- to-one model. It allows multiple threads to run in parallel on multiprocessors.  The only drawback to this model is that creating a user thread requires creating the corresponding kernel thread.  The overhead of creating kernel threads can burden the performance of an application. 11
  • 12. Multithreading Models  Many-to-Many Model :  The many-to-many model multiplexes many user-level threads to a smaller or equal number of kernel threads.  The number of kernel threads may be specific to either a particular application or a particular machine.  Developers can create as many user threads as necessary, and the corresponding kernel threads can run in parallel on a multiprocessor. 12
  • 13. Multithreading: ILP Support to Exploit Thread-Level Parallelism 13
  • 14. Multithreading: ILP Support to Exploit Thread-Level Parallelism  Although ILP increases the performance of system; then also ILP can be quite limited or hard to exploit in some applications. Furthermore, there may be parallelism occurring naturally at a higher level in the application. For example: An online transaction-processing system has parallelism among the multiple queries and updates. These queries and updates can be processed mostly in parallel, since they are largely independent of one another. 14
  • 15. Multithreading: ILP Support to Exploit Thread-Level Parallelism  This higher-level parallelism is called thread-level parallelism (TLP) because it is logically structured as separate threads of execution.  ILP is parallel operations within a loop or straight-line code.  TLP is represented by the use of multiple threads of execution that are in parallel. 15
  • 16. Multithreading: ILP Support to Exploit Thread-Level Parallelism  Thread-level parallelism is an important alternative to instruction- level parallelism.  In many applications thread-level parallelism occurs naturally (many server applications).  If software is written from scratch, then expressing the parallelism is much easy.  But if established applications written without parallelism in mind, then there can be significant challenges and can be extremely costly to rewrite them to exploit thread-level parallelism. 16
  • 17. Multithreading: ILP Support to Exploit Thread-Level Parallelism  TLP and ILP exploits two different kinds of parallel structures.  The crucial question is: Can we exploit TLP on processor designed for ILP  Answer is: Yes Datapath designed to exploit ILP will find that many functional units are often idle because of either stalls or dependences in the code. The threads can be used as a independent instructions that might keep the processor busy to implement TLP. 17
  • 18. Multithreading: ILP Support to Exploit Thread-Level Parallelism  Multithreading allows multiple threads to share the functional units of a single processor in an overlapping fashion.  To permit this sharing, the processor must duplicate the independent state of each thread.  For example: A separate copy of the register file, a separate PC and a separate page table were required for each thread.  In addition, the hardware must support the ability to change to a different threads relatively quickly. 18
  • 19. Multithreading: ILP Support to Exploit Thread-Level Parallelism  There are two main approaches to multithreading.  Fine-grained multithreading &  Coarse-grained multithreading 19
  • 20. Multithreading: ILP Support to Exploit Thread-Level Parallelism  Fine-grained multithreading:  It switches between threads on each instruction, causing the execution of multiple threads to be interleaved.  This interleaving is often done in a round-robin fashion.  To make fine-grained multithreading practical, the CPU must be able to switch threads on every clock cycle.  Advantage: It can hide the throughput losses that arise from both short and long stalls.  Disadvantage: It slows down the execution of the individual threads. 20
  • 21. Multithreading: ILP Support to Exploit Thread-Level Parallelism  Coarse-grained multithreading:  It was invented as an alternative to fine-grained multithreading.  Coarse-grained multithreading switches threads only on costly (larger) stalls.  Advantage: This change relieves the need to have thread switching.  Disadvantage: They are likely to slow the processor down, since instructions from other threads will only be issued when a thread encounters a costly (larger) stalls. 21
  • 22. Multithreading: ILP Support to Exploit Thread-Level Parallelism  CPU with coarse-grained multithreading issues instructions from a single thread.  When a stall occurs, the pipeline must be emptied or frozen.  New thread that is executing after the stall must fill the pipeline.  Because of this start-up overhead, coarse grained multithreading is much more useful for reducing the penalty of high-cost stalls, where pipeline refill is negligible compared to the stall time. 22
  • 23. Converting Thread-Level Parallelism into Instruction-Level Parallelism  Simultaneous multithreading (SMT) is a variation on multithreading that uses the resources of a multiple-issue, dynamically scheduled processor to exploit TLP.  Multiple-issue processors often have more functional unit parallelism than a single thread, motivates the use of SMT.  With register renaming and dynamic scheduling, multiple instructions from independent threads can be issued without considering the dependences among them. 23
  • 24. Converting Thread-Level Parallelism into Instruction-Level Parallelism Figure illustrates the differences in a processor’s ability to exploit the resources of a superscalar for the following configurations:  A superscalar with no multithreading support  A superscalar with coarse-grained multithreading  A superscalar with fine-grained multithreading  A superscalar with simultaneous multithreading 24
  • 25. Converting Thread-Level Parallelism into Instruction-Level Parallelism  In the superscalar without multithreading support, the use of issue slots is limited by a lack of ILP.  In addition, a major stall, such as an instruction cache miss, can leave the entire processor idle. 25 An empty (white) box indicates that the corresponding issue slot is unused in that clock cycle. Black is used to indicate the occupied issue slots
  • 26. Converting Thread-Level Parallelism into Instruction-Level Parallelism  In the coarse-grained multithreaded superscalar, the long stalls are partially hidden by switching to another thread that uses the resources of the processor.  This reduces the number of completely idle clock cycles, within each clock cycle, the ILP limitations still lead to idle cycles.  In a coarse grained multithreaded processor, thread switching only occurs when there is a stall, then also there will be some fully idle cycles remaining. 26 The shades of grey and black correspond to different threads in the multithreading processors.
  • 27. Converting Thread-Level Parallelism into Instruction-Level Parallelism  In the fine-grained multithreading, the interleaving of threads eliminates fully empty slots.  Because only one thread issues instructions in a given clock cycle, ILP limitations still lead to a significant number of idle slots within individual clock cycles. 27 An empty (white) box indicates that the corresponding issue slot is unused in that clock cycle. The shades of grey and black correspond to four different threads in the multithreading processors.
  • 28. Converting Thread-Level Parallelism into Instruction-Level Parallelism  In SMT, TLP and ILP are exploited simultaneously.  Ideally, the issue slot usage is limited by imbalances in the resource needs and resource availability over multiple threads.  In practice, other factors — - how many active threads are considered, - finite limitations on buffers, - the ability to fetch enough instructions from multiple threads, and - practical limitations of what instruction combinations can issue from one thread and from multiple threads—can also restrict how many slots are used. 28
  • 29. Converting Thread-Level Parallelism into Instruction-Level Parallelism  Design Challenges in SMT  Because a dynamically scheduled superscalar processor has a deep pipeline, coarse-grained MT will gain much in performance.  Since SMT makes sense only in a fine-grained implementation, we should think about the impact of fine-grained scheduling on single- thread performance.  This effect can be minimized by having a preferred thread, which still permits multithreading to preserve some of its performance advantage with a smaller compromise in single-thread performance. 29
  • 30. Converting Thread-Level Parallelism into Instruction-Level Parallelism  Design Challenges in SMT  Other design challenges for an SMT processor:  Dealing with a larger register file needed to hold multiple contexts.  Not affecting the clock cycle, particularly in instruction issue, where more instructions needs to be considered, and choosing what instructions to commit may be challenging.  Ensuring that the cache and TLB conflicts generated by the simultaneous execution of multiple threads do not cause significant performance degradation is also challenging. 30
  • 31. Converting Thread-Level Parallelism into Instruction-Level Parallelism  Design Challenges in SMT  In many cases, the potential performance overhead due to multithreading is small.  The efficiency of current superscalars is low enough that there is scope for significant improvement, even at the cost of some overhead. 31
  • 32. Performance and Efficiency in Advanced Multiple-Issue Processors 32
  • 33. Performance and Efficiency in Advanced Multiple-Issue Processors  The question of efficiency in terms of silicon area and power is equally critical.  Power is the major constraint on modern processors.  The Itanium 2 is the most inefficient processor both for floating-point and integer code.  The Athlon and Pentium 4 both makes good use of transistor and area in terms of efficiency.  The IBM Power5 is the most effective user of energy.  The fact that none of the processors offer an great advantage in efficiency. 33
  • 34. Performance and Efficiency in Advanced Multiple-Issue Processors  What Limits Multiple-Issue Processors?  Power is a function of both static power (proportional to the transistor count, whether or not the transistors are switching), and dynamic power (proportional to the product of the number of switching transistors and the switching rate).  Static power is certainly a design concern, and dynamic power is usually the dominant energy consumer.  A microprocessor trying to achieve both a low CPI and a high CR must switch more transistors and switch them faster. 34
  • 35. Performance and Efficiency in Advanced Multiple-Issue Processors  What Limits Multiple-Issue Processors?  Most techniques used for increasing performance, (multiple cores and multithreading) will increase power consumption.  The key question is whether a technique is energy efficient? Does it increase power consumption faster than it increases performance? 35
  • 36. Performance and Efficiency in Advanced Multiple-Issue Processors  What Limits Multiple-Issue Processors?  This inefficiency, arises from two primary characteristics:  First, issuing multiple instructions incurs some overhead in logic that grows faster than the issue rate grows.  This logic is responsible for instruction issue analysis, including dependence checking, register renaming, and similar functions.  The combined result is that, lower CPIs are likely to lead to lower ratios of performance per watt, simply due to overhead. 36
  • 37. Performance and Efficiency in Advanced Multiple-Issue Processors  What Limits Multiple-Issue Processors?  Second, the growing gap between peak issue rates and sustained performance.  The number of transistors switching will be proportional to the peak issue rate, and the performance is proportional to the sustained rate.  For example: If we want to sustain four instructions per clock, we must fetch more, issue more, and initiate execution on more than four instructions.  The power will be proportional to the peak rate, but performance will be at the sustained rate. 37
  • 38. Performance and Efficiency in Advanced Multiple-Issue Processors  What Limits Multiple-Issue Processors?  Important technique for increasing the exploitation of ILP (speculation) — is inefficient… because it can never be perfect.  If speculation were perfect, it could save power, since it would reduce the execution time and save static power.  When speculation is not perfect, it rapidly becomes energy inefficient, since it requires additional dynamic power. 38
  • 39. Performance and Efficiency in Advanced Multiple-Issue Processors  What Limits Multiple-Issue Processors?  Focusing on improving clock rate:  Increasing the clock rate will increase transistor switching frequency and directly increase power consumption.  To achieve a faster clock rate, we would need to increase pipeline depth.  Deeper pipelines, incur additional overhead penalties as well as causing higher switching rates. 39
  • 40. 40 This presentation is published only for educational purpose shindesir.pvp@gmail.com