The document discusses multithreading and how it can be used to exploit thread-level parallelism (TLP) in processors designed for instruction-level parallelism (ILP). There are two main approaches for multithreading - fine-grained and coarse-grained. Fine-grained switches threads every instruction while coarse-grained switches on long stalls. Simultaneous multithreading (SMT) allows a processor to issue instructions from multiple threads in the same cycle by treating instructions from different threads as independent. This converts TLP into additional ILP to better utilize the resources of superscalar and multicore processors.
Gives an overview about Process, PCB, Process States, Process Operations, Scheduling, Schedulers, Interprocess communication, shared memory and message passing systems
In this presentation, I am explaining about Threads, types of threads, its advantages and disadvantages, difference between Process and Threads, multithreading and its type.
"Like the ppt if you liked the ppt"
LinkedIn - https://in.linkedin.com/in/prakharmaurya
A brief introduction to Process synchronization in Operating Systems with classical examples and solutions using semaphores. A good starting tutorial for beginners.
Gives an overview about Process, PCB, Process States, Process Operations, Scheduling, Schedulers, Interprocess communication, shared memory and message passing systems
In this presentation, I am explaining about Threads, types of threads, its advantages and disadvantages, difference between Process and Threads, multithreading and its type.
"Like the ppt if you liked the ppt"
LinkedIn - https://in.linkedin.com/in/prakharmaurya
A brief introduction to Process synchronization in Operating Systems with classical examples and solutions using semaphores. A good starting tutorial for beginners.
This Presentation is for Memory Management in Operating System (OS). This Presentation describes the basic need for the Memory Management in our OS and its various Techniques like Swapping, Fragmentation, Paging and Segmentation.
Deadlocks-An Unconditional Waiting Situation in Operating System. We must make sure of This concept well before understanding deep in to Operating System. This PPT will understands you to get how the deadlocks Occur and how can we Detect, avoid and Prevent the deadlocks in Operating Systems.
Virtual Memory
• Copy-on-Write
• Page Replacement
• Allocation of Frames
• Thrashing
• Operating-System Examples
Background
Page Table When Some PagesAre Not in Main Memory
Steps in Handling a Page Fault
This slide contain the description about the various technique related to parallel Processing(vector Processing and array processor), Arithmetic pipeline, Instruction Pipeline, SIMD processor, Attached array processor
This Presentation is for Memory Management in Operating System (OS). This Presentation describes the basic need for the Memory Management in our OS and its various Techniques like Swapping, Fragmentation, Paging and Segmentation.
Deadlocks-An Unconditional Waiting Situation in Operating System. We must make sure of This concept well before understanding deep in to Operating System. This PPT will understands you to get how the deadlocks Occur and how can we Detect, avoid and Prevent the deadlocks in Operating Systems.
Virtual Memory
• Copy-on-Write
• Page Replacement
• Allocation of Frames
• Thrashing
• Operating-System Examples
Background
Page Table When Some PagesAre Not in Main Memory
Steps in Handling a Page Fault
This slide contain the description about the various technique related to parallel Processing(vector Processing and array processor), Arithmetic pipeline, Instruction Pipeline, SIMD processor, Attached array processor
This ppt gives a general idea about the multithreading concepts in the java programming language. hope you find it useful
P.S :
sorry there is a correction in one of the slides
where i have entered implements thread
it is wrong it is actually implements Runnable
thank you!
Java Multi Threading Concept
By N.V.Raja Sekhar Reddy
www.technolamp.co.in
Want more...
Like us @ https://www.facebook.com/Technolamp.co.in
subscribe videos @ http://www.youtube.com/user/nvrajasekhar
1. What important part of the process switch operation is not shown .pdffathimaoptical
1. What important part of the process switch operation is not shown in Figure 3.4?
2. What is the operational difference between single-threaded and multi-threaded processes? i.e.,
how does it change the usage of each?
3. What kinds of operations take advantage of threads? Think of depth and breadth.
1).consider task parallelism
2).consider data parallelism
4.What is the difference between Many to One, One to One, and Many to Many models?
1).What are the benefits and constraints of each of these?
2).Provide examples of each of these
3).How does the two-level model help thread operations? process Po operating system process P
interrupt or system call executing save state into PCBo idle reload state from PCB1 dle interrupt
or system call executing save state into PCB1 idle reload state from PCB0 executing Figure 3.4
Diagram showing CPU switch from process to process.
Solution
PCB daigaram.
1.The Program control Block diagram is important ,we have PCB in the diagram,but in detail.
For each process there is a Process Control Block, PCB,
which stores the following ( types of ) process-specific information, as illustrated in Figure 3.1. (
Specific details may vary from system to system. )
•Process State - Running, waiting, etc., as discussed above.
•Process ID, and parent process ID.
•CPU registers and Program Counter - These need to be saved and restored when swapping
processes in and out of the CPU.
•CPU-Scheduling information - Such as priority information and pointers to scheduling queues.
•Memory-Management information - E.g. page tables or segment tables.
•Accounting information - user and kernel CPU time consumed, account numbers, limits, etc.
•I/O Status information - Devices allocated, open file tables, etc.
2.With a single thread process, the process runs/executes on single path.With multiple thread
process is where a process runs/executes on two or more paths.
Applications with multithreading implementation increases its responsiveness to the
application’s users, for instance;
with traditional single-threaded process implementation within a web server can serve only one
client request at a time and can make the waiting period for other users requesting services a very
long time.
With more efficient multithreaded server implementation; separate threads can be created to
respond to different users’ requests.
Multithreading technique in the above example increased the application responsiveness to the
users’ requests.
3.
Multiple Processes ,example proxy server satisfying the requests for a number of computers on a
LAN would be benefited by a multi-threaded process.
Task parallelism is the simultaneous execution on multiple cores of many different functions
across the same or different datasets.
This form of parallelism covers the execution of computer programs across multiple processors
on same or multiple machines. It focuses on executing different operations in parallel to fully
utilize the available computing resources in form of proces.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
A thread is a portion of code that may be executed independently of the main program.
For example, a program may have an open thread waiting for a specific event to occur or running a separate job, allowing the main program to perform other tasks.
A program is capable of having multiple threads open at once and will either terminate or suspend them after a task is completed, or the program is closed.
Blooms Taxonomy in Engineering EducationA B Shinde
The objective of this presentation is to create awareness among the aspirants regarding the Blooms Taxonomy, how it can be related to define Course objectives and outcomes as well as to assess the students level
This presentation covers the basic guidelines regarding how to face the interview including resume writing, aptitude test, group discussion and facing interview confidently...
Automobile Management System Project Report.pdfKamal Acharya
The proposed project is developed to manage the automobile in the automobile dealer company. The main module in this project is login, automobile management, customer management, sales, complaints and reports. The first module is the login. The automobile showroom owner should login to the project for usage. The username and password are verified and if it is correct, next form opens. If the username and password are not correct, it shows the error message.
When a customer search for a automobile, if the automobile is available, they will be taken to a page that shows the details of the automobile including automobile name, automobile ID, quantity, price etc. “Automobile Management System” is useful for maintaining automobiles, customers effectively and hence helps for establishing good relation between customer and automobile organization. It contains various customized modules for effectively maintaining automobiles and stock information accurately and safely.
When the automobile is sold to the customer, stock will be reduced automatically. When a new purchase is made, stock will be increased automatically. While selecting automobiles for sale, the proposed software will automatically check for total number of available stock of that particular item, if the total stock of that particular item is less than 5, software will notify the user to purchase the particular item.
Also when the user tries to sale items which are not in stock, the system will prompt the user that the stock is not enough. Customers of this system can search for a automobile; can purchase a automobile easily by selecting fast. On the other hand the stock of automobiles can be maintained perfectly by the automobile shop manager overcoming the drawbacks of existing system.
COLLEGE BUS MANAGEMENT SYSTEM PROJECT REPORT.pdfKamal Acharya
The College Bus Management system is completely developed by Visual Basic .NET Version. The application is connect with most secured database language MS SQL Server. The application is develop by using best combination of front-end and back-end languages. The application is totally design like flat user interface. This flat user interface is more attractive user interface in 2017. The application is gives more important to the system functionality. The application is to manage the student’s details, driver’s details, bus details, bus route details, bus fees details and more. The application has only one unit for admin. The admin can manage the entire application. The admin can login into the application by using username and password of the admin. The application is develop for big and small colleges. It is more user friendly for non-computer person. Even they can easily learn how to manage the application within hours. The application is more secure by the admin. The system will give an effective output for the VB.Net and SQL Server given as input to the system. The compiled java program given as input to the system, after scanning the program will generate different reports. The application generates the report for users. The admin can view and download the report of the data. The application deliver the excel format reports. Because, excel formatted reports is very easy to understand the income and expense of the college bus. This application is mainly develop for windows operating system users. In 2017, 73% of people enterprises are using windows operating system. So the application will easily install for all the windows operating system users. The application-developed size is very low. The application consumes very low space in disk. Therefore, the user can allocate very minimum local disk space for this application.
Cosmetic shop management system project report.pdfKamal Acharya
Buying new cosmetic products is difficult. It can even be scary for those who have sensitive skin and are prone to skin trouble. The information needed to alleviate this problem is on the back of each product, but it's thought to interpret those ingredient lists unless you have a background in chemistry.
Instead of buying and hoping for the best, we can use data science to help us predict which products may be good fits for us. It includes various function programs to do the above mentioned tasks.
Data file handling has been effectively used in the program.
The automated cosmetic shop management system should deal with the automation of general workflow and administration process of the shop. The main processes of the system focus on customer's request where the system is able to search the most appropriate products and deliver it to the customers. It should help the employees to quickly identify the list of cosmetic product that have reached the minimum quantity and also keep a track of expired date for each cosmetic product. It should help the employees to find the rack number in which the product is placed.It is also Faster and more efficient way.
Quality defects in TMT Bars, Possible causes and Potential Solutions.PrashantGoswami42
Maintaining high-quality standards in the production of TMT bars is crucial for ensuring structural integrity in construction. Addressing common defects through careful monitoring, standardized processes, and advanced technology can significantly improve the quality of TMT bars. Continuous training and adherence to quality control measures will also play a pivotal role in minimizing these defects.
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...Dr.Costas Sachpazis
Terzaghi's soil bearing capacity theory, developed by Karl Terzaghi, is a fundamental principle in geotechnical engineering used to determine the bearing capacity of shallow foundations. This theory provides a method to calculate the ultimate bearing capacity of soil, which is the maximum load per unit area that the soil can support without undergoing shear failure. The Calculation HTML Code included.
Saudi Arabia stands as a titan in the global energy landscape, renowned for its abundant oil and gas resources. It's the largest exporter of petroleum and holds some of the world's most significant reserves. Let's delve into the top 10 oil and gas projects shaping Saudi Arabia's energy future in 2024.
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)MdTanvirMahtab2
This presentation is about the working procedure of Shahjalal Fertilizer Company Limited (SFCL). A Govt. owned Company of Bangladesh Chemical Industries Corporation under Ministry of Industries.
Vaccine management system project report documentation..pdfKamal Acharya
The Division of Vaccine and Immunization is facing increasing difficulty monitoring vaccines and other commodities distribution once they have been distributed from the national stores. With the introduction of new vaccines, more challenges have been anticipated with this additions posing serious threat to the already over strained vaccine supply chain system in Kenya.
About
Indigenized remote control interface card suitable for MAFI system CCR equipment. Compatible for IDM8000 CCR. Backplane mounted serial and TCP/Ethernet communication module for CCR remote access. IDM 8000 CCR remote control on serial and TCP protocol.
• Remote control: Parallel or serial interface.
• Compatible with MAFI CCR system.
• Compatible with IDM8000 CCR.
• Compatible with Backplane mount serial communication.
• Compatible with commercial and Defence aviation CCR system.
• Remote control system for accessing CCR and allied system over serial or TCP.
• Indigenized local Support/presence in India.
• Easy in configuration using DIP switches.
Technical Specifications
Indigenized remote control interface card suitable for MAFI system CCR equipment. Compatible for IDM8000 CCR. Backplane mounted serial and TCP/Ethernet communication module for CCR remote access. IDM 8000 CCR remote control on serial and TCP protocol.
Key Features
Indigenized remote control interface card suitable for MAFI system CCR equipment. Compatible for IDM8000 CCR. Backplane mounted serial and TCP/Ethernet communication module for CCR remote access. IDM 8000 CCR remote control on serial and TCP protocol.
• Remote control: Parallel or serial interface
• Compatible with MAFI CCR system
• Copatiable with IDM8000 CCR
• Compatible with Backplane mount serial communication.
• Compatible with commercial and Defence aviation CCR system.
• Remote control system for accessing CCR and allied system over serial or TCP.
• Indigenized local Support/presence in India.
Application
• Remote control: Parallel or serial interface.
• Compatible with MAFI CCR system.
• Compatible with IDM8000 CCR.
• Compatible with Backplane mount serial communication.
• Compatible with commercial and Defence aviation CCR system.
• Remote control system for accessing CCR and allied system over serial or TCP.
• Indigenized local Support/presence in India.
• Easy in configuration using DIP switches.
1. Multithreading
Mr. A. B. Shinde
Assistant Professor,
Electronics Engineering,
P.V.P.I.T., Budhgaon
2. Contents…
Using ILP support to exploit
thread –level parallelism
performance and efficiency in
advanced multiple issue
processors
2
3. Threads
A thread is a basic unit of CPU utilization.
A thread is a separate process with its own instructions and data.
A thread may represent a process that is part of a parallel program
consisting of multiple processes, or it may represent an
independent program.
3
4. Threads
It comprises of a thread ID, a program counter, a register set and a
stack.
It shares its code section, data section, and other operating-system
resources, such as open files and signals with other threads
belonging to the same Process.
A traditional process has a single thread of control. If a process has
multiple threads of control, it can perform more than one task at a time.
4
5. Threads
Many software packages that run
on modern desktop PCs are
multithreaded.
For example:
A word processor may have:
a thread for displaying graphics,
another thread for responding to
keystrokes from the user, and
a third thread for performing spelling
and grammar checking in the
background.
5
6. Threads
Threads also play a vital role in remote procedure call (RPC)
systems.
RPCs allows interprocess communication by providing a
communication mechanism similar to ordinary function or procedure
calls.
Many operating system kernels are multithreaded; several threads
operate in the kernel, and each thread performs a specific task, such as
managing devices or interrupt handling.
6
7. Multithreading
Benefits:
1. Responsiveness: Multithreading is an interactive application that
may allow a program to continue running even if part of it is
blocked, thereby increasing responsiveness to the user.
For example: A multithreaded web browser could still allow user
interaction in one thread while an image was being loaded in another
thread.
2. Resource sharing: By default, threads share the memory and the
resources of the process to which they belong. The benefit of sharing
code and data is that it allows an application to have several different
threads of activity within the same address space.
7
8. Multithreading
Benefits:
3. Economy: Allocating memory and resources for process creation is
costly. Since threads share resources of the process to which they
belong, they will provide cost effective solution.
4. Utilization of multiprocessor architectures: In multiprocessor
architecture, threads may be running in parallel on different processors.
A single threaded process can only run on one CPU, no matter how
many are available.
Multithreading on a multi-CPU machine increases concurrency.
8
9. Multithreading Models
Support for threads may be provided either at the user level or at
the kernel level.
User threads are supported above the kernel and are managed
without kernel support, whereas kernel threads are supported and
managed directly by the operating system.
9
10. Multithreading Models
Many-to-One Model:
The many-to-one model maps many user-
level threads to one kernel thread.
Thread management is done by the
thread library in user space, so it is
efficient.
Only one thread can access the kernel at
a time, hence multiple threads are unable to
run in parallel on multiprocessors.
10
11. Multithreading Models
One-to-One Model:
The one-to-one model maps each user
thread to a kernel thread.
It provides more concurrency than the many-
to-one model. It allows multiple threads to run in
parallel on multiprocessors.
The only drawback to this model is that
creating a user thread requires creating the
corresponding kernel thread.
The overhead of creating kernel threads can
burden the performance of an application.
11
12. Multithreading Models
Many-to-Many Model :
The many-to-many model multiplexes many
user-level threads to a smaller or equal
number of kernel threads.
The number of kernel threads may be specific
to either a particular application or a particular
machine.
Developers can create as many user threads
as necessary, and the corresponding kernel
threads can run in parallel on a
multiprocessor.
12
14. Multithreading: ILP Support to Exploit
Thread-Level Parallelism
Although ILP increases the performance of system; then also ILP
can be quite limited or hard to exploit in some applications.
Furthermore, there may be parallelism occurring naturally at a higher
level in the application.
For example:
An online transaction-processing system has parallelism among the
multiple queries and updates. These queries and updates can be
processed mostly in parallel, since they are largely independent of one
another.
14
15. Multithreading: ILP Support to Exploit
Thread-Level Parallelism
This higher-level parallelism is called thread-level parallelism (TLP)
because it is logically structured as separate threads of execution.
ILP is parallel operations within a loop or straight-line code.
TLP is represented by the use of multiple threads of execution that
are in parallel.
15
16. Multithreading: ILP Support to Exploit
Thread-Level Parallelism
Thread-level parallelism is an important alternative to instruction-
level parallelism.
In many applications thread-level parallelism occurs naturally (many
server applications).
If software is written from scratch, then expressing the parallelism
is much easy.
But if established applications written without parallelism in mind,
then there can be significant challenges and can be extremely costly
to rewrite them to exploit thread-level parallelism.
16
17. Multithreading: ILP Support to Exploit
Thread-Level Parallelism
TLP and ILP exploits two different kinds of parallel structures.
The crucial question is:
Can we exploit TLP on processor designed for ILP
Answer is: Yes
Datapath designed to exploit ILP will find that many functional units are
often idle because of either stalls or dependences in the code.
The threads can be used as a independent instructions that might keep
the processor busy to implement TLP.
17
18. Multithreading: ILP Support to Exploit
Thread-Level Parallelism
Multithreading allows multiple threads to share the functional units
of a single processor in an overlapping fashion.
To permit this sharing, the processor must duplicate the
independent state of each thread.
For example:
A separate copy of the register file, a separate PC and a separate page
table were required for each thread.
In addition, the hardware must support the ability to change to a
different threads relatively quickly.
18
19. Multithreading: ILP Support to Exploit
Thread-Level Parallelism
There are two main approaches to multithreading.
Fine-grained multithreading &
Coarse-grained multithreading
19
20. Multithreading: ILP Support to Exploit
Thread-Level Parallelism
Fine-grained multithreading:
It switches between threads on each instruction, causing the
execution of multiple threads to be interleaved.
This interleaving is often done in a round-robin fashion.
To make fine-grained multithreading practical, the CPU must be
able to switch threads on every clock cycle.
Advantage: It can hide the throughput losses that arise from both short
and long stalls.
Disadvantage: It slows down the execution of the individual threads.
20
21. Multithreading: ILP Support to Exploit
Thread-Level Parallelism
Coarse-grained multithreading:
It was invented as an alternative to fine-grained multithreading.
Coarse-grained multithreading switches threads only on costly
(larger) stalls.
Advantage: This change relieves the need to have thread switching.
Disadvantage: They are likely to slow the processor down, since
instructions from other threads will only be issued when a thread
encounters a costly (larger) stalls.
21
22. Multithreading: ILP Support to Exploit
Thread-Level Parallelism
CPU with coarse-grained multithreading issues instructions from a
single thread.
When a stall occurs, the pipeline must be emptied or frozen.
New thread that is executing after the stall must fill the pipeline.
Because of this start-up overhead, coarse grained multithreading is
much more useful for reducing the penalty of high-cost stalls,
where pipeline refill is negligible compared to the stall time.
22
23. Converting Thread-Level
Parallelism into Instruction-Level Parallelism
Simultaneous multithreading (SMT) is a variation on multithreading
that uses the resources of a multiple-issue, dynamically scheduled
processor to exploit TLP.
Multiple-issue processors often have more functional unit
parallelism than a single thread, motivates the use of SMT.
With register renaming and dynamic scheduling, multiple instructions
from independent threads can be issued without considering the
dependences among them.
23
24. Converting Thread-Level
Parallelism into Instruction-Level Parallelism
Figure illustrates the differences in a processor’s ability to exploit the
resources of a superscalar for the following configurations:
A superscalar with no multithreading support
A superscalar with coarse-grained multithreading
A superscalar with fine-grained multithreading
A superscalar with simultaneous multithreading
24
25. Converting Thread-Level
Parallelism into Instruction-Level Parallelism
In the superscalar without multithreading support,
the use of issue slots is limited by a lack of ILP.
In addition, a major stall, such as an instruction
cache miss, can leave the entire processor idle.
25
An empty (white) box indicates that the
corresponding issue slot is unused in that clock
cycle.
Black is used to indicate the occupied issue slots
26. Converting Thread-Level
Parallelism into Instruction-Level Parallelism
In the coarse-grained multithreaded superscalar,
the long stalls are partially hidden by switching
to another thread that uses the resources of the
processor.
This reduces the number of completely idle
clock cycles, within each clock cycle, the ILP
limitations still lead to idle cycles.
In a coarse grained multithreaded processor,
thread switching only occurs when there is a
stall, then also there will be some fully idle cycles
remaining.
26
The shades of grey and black correspond to
different threads in the multithreading processors.
27. Converting Thread-Level
Parallelism into Instruction-Level Parallelism
In the fine-grained multithreading, the
interleaving of threads eliminates fully empty
slots.
Because only one thread issues instructions in
a given clock cycle, ILP limitations still lead to a
significant number of idle slots within individual
clock cycles.
27
An empty (white) box indicates that the
corresponding issue slot is unused in that clock
cycle.
The shades of grey and black correspond to four
different threads in the multithreading processors.
28. Converting Thread-Level
Parallelism into Instruction-Level Parallelism
In SMT, TLP and ILP are exploited
simultaneously.
Ideally, the issue slot usage is limited by
imbalances in the resource needs and resource
availability over multiple threads.
In practice, other factors —
- how many active threads are considered,
- finite limitations on buffers,
- the ability to fetch enough instructions from
multiple threads, and
- practical limitations of what instruction
combinations can issue from one thread and
from multiple threads—can also restrict how
many slots are used.
28
29. Converting Thread-Level
Parallelism into Instruction-Level Parallelism
Design Challenges in SMT
Because a dynamically scheduled superscalar processor has a
deep pipeline, coarse-grained MT will gain much in performance.
Since SMT makes sense only in a fine-grained implementation, we
should think about the impact of fine-grained scheduling on single-
thread performance.
This effect can be minimized by having a preferred thread, which
still permits multithreading to preserve some of its performance
advantage with a smaller compromise in single-thread performance.
29
30. Converting Thread-Level
Parallelism into Instruction-Level Parallelism
Design Challenges in SMT
Other design challenges for an SMT processor:
Dealing with a larger register file needed to hold multiple contexts.
Not affecting the clock cycle, particularly in instruction issue, where
more instructions needs to be considered, and choosing what
instructions to commit may be challenging.
Ensuring that the cache and TLB conflicts generated by the
simultaneous execution of multiple threads do not cause significant
performance degradation is also challenging.
30
31. Converting Thread-Level
Parallelism into Instruction-Level Parallelism
Design Challenges in SMT
In many cases, the potential performance overhead due to
multithreading is small.
The efficiency of current superscalars is low enough that there is
scope for significant improvement, even at the cost of some overhead.
31
33. Performance and Efficiency in Advanced
Multiple-Issue Processors
The question of efficiency in terms of silicon area and power is
equally critical.
Power is the major constraint on modern processors.
The Itanium 2 is the most inefficient processor both for floating-point
and integer code.
The Athlon and Pentium 4 both makes good use of transistor and
area in terms of efficiency.
The IBM Power5 is the most effective user of energy.
The fact that none of the processors offer an great advantage in
efficiency.
33
34. Performance and Efficiency in Advanced
Multiple-Issue Processors
What Limits Multiple-Issue Processors?
Power is a function of both static power (proportional to the transistor
count, whether or not the transistors are switching), and dynamic
power (proportional to the product of the number of switching
transistors and the switching rate).
Static power is certainly a design concern, and dynamic power is
usually the dominant energy consumer.
A microprocessor trying to achieve both a low CPI and a high CR
must switch more transistors and switch them faster.
34
35. Performance and Efficiency in Advanced
Multiple-Issue Processors
What Limits Multiple-Issue Processors?
Most techniques used for increasing performance, (multiple cores
and multithreading) will increase power consumption.
The key question is whether a technique is energy efficient?
Does it increase power consumption faster than it increases
performance?
35
36. Performance and Efficiency in Advanced
Multiple-Issue Processors
What Limits Multiple-Issue Processors?
This inefficiency, arises from two primary characteristics:
First, issuing multiple instructions incurs some overhead in logic
that grows faster than the issue rate grows.
This logic is responsible for instruction issue analysis, including
dependence checking, register renaming, and similar functions.
The combined result is that, lower CPIs are likely to lead to lower
ratios of performance per watt, simply due to overhead.
36
37. Performance and Efficiency in Advanced
Multiple-Issue Processors
What Limits Multiple-Issue Processors?
Second, the growing gap between peak issue rates and sustained
performance.
The number of transistors switching will be proportional to the
peak issue rate, and the performance is proportional to the
sustained rate.
For example: If we want to sustain four instructions per clock, we must
fetch more, issue more, and initiate execution on more than four
instructions.
The power will be proportional to the peak rate, but performance
will be at the sustained rate.
37
38. Performance and Efficiency in Advanced
Multiple-Issue Processors
What Limits Multiple-Issue Processors?
Important technique for increasing the exploitation of ILP (speculation)
— is inefficient… because it can never be perfect.
If speculation were perfect, it could save power, since it would
reduce the execution time and save static power.
When speculation is not perfect, it rapidly becomes energy
inefficient, since it requires additional dynamic power.
38
39. Performance and Efficiency in Advanced
Multiple-Issue Processors
What Limits Multiple-Issue Processors?
Focusing on improving clock rate:
Increasing the clock rate will increase transistor switching
frequency and directly increase power consumption.
To achieve a faster clock rate, we would need to increase pipeline
depth.
Deeper pipelines, incur additional overhead penalties as well as
causing higher switching rates.
39