This document discusses instruction-level parallelism (ILP) limitations. It covers ILP background using a MIPS example, hardware models that were studied including register renaming and branch/jump prediction assumptions. A study of ILP limitations found diminishing returns with larger window sizes and realizable processors are limited by complexity and power constraints. Simultaneous multithreading was explored as a technique to improve ILP but has its own design challenges. Today, x86 and ARM processors employ various ILP optimizations within pipeline constraints.
From the perspective of Design and Analysis of Algorithm. I made these slide by collecting data from many sites.
I am Danish Javed. Student of BSCS Hons. at ITU Information Technology University Lahore, Punjab, Pakistan.
program partitioning and scheduling IN Advanced Computer ArchitecturePankaj Kumar Jain
Advanced Computer Architecture,Program Partitioning and Scheduling,Program Partitioning & Scheduling,Latency,Levels of Parallelism,Loop-level Parallelism,Subprogram-level Parallelism,Job or Program-Level Parallelism,Communication Latency,Grain Packing and Scheduling,Program Graphs and Packing
There are two primary forms of data exchange between parallel tasks - accessing a shared data space and exchanging messages.
Platforms that provide a shared data space are called shared-address-space machines or multiprocessors.
Platforms that support messaging are also called message passing platforms or multicomputers.
Parallel programming platforms are introduced here. For more information about parallel programming and distributed computing visit,
https://sites.google.com/view/vajira-thambawita/leaning-materials
advanced computer architesture-conditions of parallelismPankaj Kumar Jain
This PPT contains Data and Resource Dependencies,Control Dependence,Resource Dependence,Bernstein’s Conditions ,Hardware And Software Parallelism,Types of Software Parallelism
From the perspective of Design and Analysis of Algorithm. I made these slide by collecting data from many sites.
I am Danish Javed. Student of BSCS Hons. at ITU Information Technology University Lahore, Punjab, Pakistan.
program partitioning and scheduling IN Advanced Computer ArchitecturePankaj Kumar Jain
Advanced Computer Architecture,Program Partitioning and Scheduling,Program Partitioning & Scheduling,Latency,Levels of Parallelism,Loop-level Parallelism,Subprogram-level Parallelism,Job or Program-Level Parallelism,Communication Latency,Grain Packing and Scheduling,Program Graphs and Packing
There are two primary forms of data exchange between parallel tasks - accessing a shared data space and exchanging messages.
Platforms that provide a shared data space are called shared-address-space machines or multiprocessors.
Platforms that support messaging are also called message passing platforms or multicomputers.
Parallel programming platforms are introduced here. For more information about parallel programming and distributed computing visit,
https://sites.google.com/view/vajira-thambawita/leaning-materials
advanced computer architesture-conditions of parallelismPankaj Kumar Jain
This PPT contains Data and Resource Dependencies,Control Dependence,Resource Dependence,Bernstein’s Conditions ,Hardware And Software Parallelism,Types of Software Parallelism
Along with idling and contention, communication is a major overhead in parallel programs.
The cost of communication is dependent on a variety of features including the programming model semantics, the network topology, data handling and routing, and associated software protocols.
Very long instruction word or VLIW refers to a processor architecture designed to take advantage of instruction level parallelism
This type of processor architecture is intended to allow higher performance without the inherent complexity of some other approaches.
(Ref : Computer System Architecture by Morris Mano 3rd edition) : Microprogrammed Control unit, micro instructions, micro operations, symbolic and binary microprogram.
An explicitly parallel program must specify concurrency and interaction between concurrent subtasks.
The former is sometimes also referred to as the control structure and the latter as the communication model.
RISC - Reduced Instruction Set ComputingTushar Swami
A detailed presentation about what is RISC and some of the basic differences between RISC and CISC Computers.
Also enlisting some of the major applications of RISC in the field of Technology.
A multiprocessor is a computer system with two or more central processing units (CPUs), with each one sharing the common main memory as well as the peripherals. This helps in simultaneous processing of programs.
The key objective of using a multiprocessor is to boost the system’s execution speed, with other objectives being fault tolerance and application matching.
A good illustration of a multiprocessor is a single central tower attached to two computer systems. A multiprocessor is regarded as a means to improve computing speeds, performance and cost-effectiveness, as well as to provide enhanced availability and reliability.
This slide contain the description about the various technique related to parallel Processing(vector Processing and array processor), Arithmetic pipeline, Instruction Pipeline, SIMD processor, Attached array processor
Multiprocessor system is an interconnection of two or more CPUs with memory and input-output equipment
The components that forms multiprocessor are CPUs IOPs connected to input –output devices , and memory unit that may be partitioned into a number of separate modules.
Multiprocessor are classified as multiple instruction stream, multiple data stream (MIMD) system.
Along with idling and contention, communication is a major overhead in parallel programs.
The cost of communication is dependent on a variety of features including the programming model semantics, the network topology, data handling and routing, and associated software protocols.
Very long instruction word or VLIW refers to a processor architecture designed to take advantage of instruction level parallelism
This type of processor architecture is intended to allow higher performance without the inherent complexity of some other approaches.
(Ref : Computer System Architecture by Morris Mano 3rd edition) : Microprogrammed Control unit, micro instructions, micro operations, symbolic and binary microprogram.
An explicitly parallel program must specify concurrency and interaction between concurrent subtasks.
The former is sometimes also referred to as the control structure and the latter as the communication model.
RISC - Reduced Instruction Set ComputingTushar Swami
A detailed presentation about what is RISC and some of the basic differences between RISC and CISC Computers.
Also enlisting some of the major applications of RISC in the field of Technology.
A multiprocessor is a computer system with two or more central processing units (CPUs), with each one sharing the common main memory as well as the peripherals. This helps in simultaneous processing of programs.
The key objective of using a multiprocessor is to boost the system’s execution speed, with other objectives being fault tolerance and application matching.
A good illustration of a multiprocessor is a single central tower attached to two computer systems. A multiprocessor is regarded as a means to improve computing speeds, performance and cost-effectiveness, as well as to provide enhanced availability and reliability.
This slide contain the description about the various technique related to parallel Processing(vector Processing and array processor), Arithmetic pipeline, Instruction Pipeline, SIMD processor, Attached array processor
Multiprocessor system is an interconnection of two or more CPUs with memory and input-output equipment
The components that forms multiprocessor are CPUs IOPs connected to input –output devices , and memory unit that may be partitioned into a number of separate modules.
Multiprocessor are classified as multiple instruction stream, multiple data stream (MIMD) system.
Automating the Hunt for Non-Obvious Sources of Latency SpreadsScyllaDB
False sharing references and power management can trigger wide latency spreads, but are neither directly observable nor easily traced to causes. This talk describes how to diagnose the problems quickly, and outlines several remedies.
Reduced instruction set computing, or RISC (pronounced 'risk', /ɹɪsk/), is a CPU design strategy based on the insight that a simplified instruction set provides higher performance when combined with a microprocessor architecture capable of executing those instructions using fewer microprocessor cycles per instruction.
The topic focuses on different aspects of processor organization and architecture such as architecture models, register organization, instruction formats, addressing modes etc.
This presentation talks about the available (as per April 2013) index related techniques with IBM Informix.
It includes indexing techniques available with IBM Informix 12.1
See all iiug presentations available on http://www.iiug.com / member area
Preparing Codes for Intel Knights Landing (KNL)AllineaSoftware
Getting ready for the next generation of Intel Xeon Phi processors: we outline the steps to tune, profile and then optimize applications to target many core
Similar to Instruction Level Parallelism (ILP) Limitations (20)
Reconfigurable Platform for the Emulation of RISC and CISC Architectures
Published on the 2012 4th CWCAS (Colombian Workshop on Circuits and Sytems)
IEEE Catalog Number CFP12CWC-CDR
ISBN: 978-1-4673-4613-9
Presentación Proyecto de Grado: X-ISCKERJose Pinilla
Esta es la presentación del proyecto de grado "Plataforma para la emulación y reconfiguración de arquitecturas RISC y CISC", denominada XISCKER (Reduced/Complex Instruction Set Computing Key Educational Resource). Desarrollado por Jose Pinilla y Alfredo Gualdrón, bajo la dirección del MSc. Alonso Retamoso
Contenido:
Justificación
Objetivos
Metodología
Futuro
Medical images compression: JPEG variations for DICOM standardJose Pinilla
This is a report that introduces the technical features of the different image compression schemes found in the DICOM standar for medical imaging archiving and communication.
Presentation made by Jose Pinilla and Alfredo Gualdrón to show the CSTAR (Canadian Surgical Technologies and Advanced Robotics) how FPGAs are being used in the Universidad Pontificia Bolivariana in Bucaramanga, Colombia.
Cosmetic shop management system project report.pdfKamal Acharya
Buying new cosmetic products is difficult. It can even be scary for those who have sensitive skin and are prone to skin trouble. The information needed to alleviate this problem is on the back of each product, but it's thought to interpret those ingredient lists unless you have a background in chemistry.
Instead of buying and hoping for the best, we can use data science to help us predict which products may be good fits for us. It includes various function programs to do the above mentioned tasks.
Data file handling has been effectively used in the program.
The automated cosmetic shop management system should deal with the automation of general workflow and administration process of the shop. The main processes of the system focus on customer's request where the system is able to search the most appropriate products and deliver it to the customers. It should help the employees to quickly identify the list of cosmetic product that have reached the minimum quantity and also keep a track of expired date for each cosmetic product. It should help the employees to find the rack number in which the product is placed.It is also Faster and more efficient way.
Saudi Arabia stands as a titan in the global energy landscape, renowned for its abundant oil and gas resources. It's the largest exporter of petroleum and holds some of the world's most significant reserves. Let's delve into the top 10 oil and gas projects shaping Saudi Arabia's energy future in 2024.
Student information management system project report ii.pdfKamal Acharya
Our project explains about the student management. This project mainly explains the various actions related to student details. This project shows some ease in adding, editing and deleting the student details. It also provides a less time consuming process for viewing, adding, editing and deleting the marks of the students.
Explore the innovative world of trenchless pipe repair with our comprehensive guide, "The Benefits and Techniques of Trenchless Pipe Repair." This document delves into the modern methods of repairing underground pipes without the need for extensive excavation, highlighting the numerous advantages and the latest techniques used in the industry.
Learn about the cost savings, reduced environmental impact, and minimal disruption associated with trenchless technology. Discover detailed explanations of popular techniques such as pipe bursting, cured-in-place pipe (CIPP) lining, and directional drilling. Understand how these methods can be applied to various types of infrastructure, from residential plumbing to large-scale municipal systems.
Ideal for homeowners, contractors, engineers, and anyone interested in modern plumbing solutions, this guide provides valuable insights into why trenchless pipe repair is becoming the preferred choice for pipe rehabilitation. Stay informed about the latest advancements and best practices in the field.
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...Amil Baba Dawood bangali
Contact with Dawood Bhai Just call on +92322-6382012 and we'll help you. We'll solve all your problems within 12 to 24 hours and with 101% guarantee and with astrology systematic. If you want to take any personal or professional advice then also you can call us on +92322-6382012 , ONLINE LOVE PROBLEM & Other all types of Daily Life Problem's.Then CALL or WHATSAPP us on +92322-6382012 and Get all these problems solutions here by Amil Baba DAWOOD BANGALI
#vashikaranspecialist #astrologer #palmistry #amliyaat #taweez #manpasandshadi #horoscope #spiritual #lovelife #lovespell #marriagespell#aamilbabainpakistan #amilbabainkarachi #powerfullblackmagicspell #kalajadumantarspecialist #realamilbaba #AmilbabainPakistan #astrologerincanada #astrologerindubai #lovespellsmaster #kalajaduspecialist #lovespellsthatwork #aamilbabainlahore#blackmagicformarriage #aamilbaba #kalajadu #kalailam #taweez #wazifaexpert #jadumantar #vashikaranspecialist #astrologer #palmistry #amliyaat #taweez #manpasandshadi #horoscope #spiritual #lovelife #lovespell #marriagespell#aamilbabainpakistan #amilbabainkarachi #powerfullblackmagicspell #kalajadumantarspecialist #realamilbaba #AmilbabainPakistan #astrologerincanada #astrologerindubai #lovespellsmaster #kalajaduspecialist #lovespellsthatwork #aamilbabainlahore #blackmagicforlove #blackmagicformarriage #aamilbaba #kalajadu #kalailam #taweez #wazifaexpert #jadumantar #vashikaranspecialist #astrologer #palmistry #amliyaat #taweez #manpasandshadi #horoscope #spiritual #lovelife #lovespell #marriagespell#aamilbabainpakistan #amilbabainkarachi #powerfullblackmagicspell #kalajadumantarspecialist #realamilbaba #AmilbabainPakistan #astrologerincanada #astrologerindubai #lovespellsmaster #kalajaduspecialist #lovespellsthatwork #aamilbabainlahore #Amilbabainuk #amilbabainspain #amilbabaindubai #Amilbabainnorway #amilbabainkrachi #amilbabainlahore #amilbabaingujranwalan #amilbabainislamabad
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdffxintegritypublishin
Advancements in technology unveil a myriad of electrical and electronic breakthroughs geared towards efficiently harnessing limited resources to meet human energy demands. The optimization of hybrid solar PV panels and pumped hydro energy supply systems plays a pivotal role in utilizing natural resources effectively. This initiative not only benefits humanity but also fosters environmental sustainability. The study investigated the design optimization of these hybrid systems, focusing on understanding solar radiation patterns, identifying geographical influences on solar radiation, formulating a mathematical model for system optimization, and determining the optimal configuration of PV panels and pumped hydro storage. Through a comparative analysis approach and eight weeks of data collection, the study addressed key research questions related to solar radiation patterns and optimal system design. The findings highlighted regions with heightened solar radiation levels, showcasing substantial potential for power generation and emphasizing the system's efficiency. Optimizing system design significantly boosted power generation, promoted renewable energy utilization, and enhanced energy storage capacity. The study underscored the benefits of optimizing hybrid solar PV panels and pumped hydro energy supply systems for sustainable energy usage. Optimizing the design of solar PV panels and pumped hydro energy supply systems as examined across diverse climatic conditions in a developing country, not only enhances power generation but also improves the integration of renewable energy sources and boosts energy storage capacities, particularly beneficial for less economically prosperous regions. Additionally, the study provides valuable insights for advancing energy research in economically viable areas. Recommendations included conducting site-specific assessments, utilizing advanced modeling tools, implementing regular maintenance protocols, and enhancing communication among system components.
Welcome to WIPAC Monthly the magazine brought to you by the LinkedIn Group Water Industry Process Automation & Control.
In this month's edition, along with this month's industry news to celebrate the 13 years since the group was created we have articles including
A case study of the used of Advanced Process Control at the Wastewater Treatment works at Lleida in Spain
A look back on an article on smart wastewater networks in order to see how the industry has measured up in the interim around the adoption of Digital Transformation in the Water Industry.
7. I. ILP: Structural Hazards
Conflict over the use of resources
Time (clock cycles)
I$
Load
Instr 1
Instr 2
Instr 3
Instr 4
I$ Reg D$ Reg
ALU
I$ Reg D$ Reg
ALU
I$ Reg D$ Reg
ALU
Reg D$ Reg
ALU
I$ Reg D$ Reg
ALU
8. I. ILP: Structural Hazards
Conflict over the use of resources
Time (clock cycles)
I$
Load
Instr 1
Instr 2
Instr 3
Instr 4
I$ Reg D$ Reg
ALU
I$ Reg D$ Reg
ALU
I$ Reg D$ Reg
ALU
Reg D$ Reg
ALU
I$ Reg D$ Reg
ALU
9. I. ILP: Structural Hazards
Time (clock cycles)
I$
Load
Instr 1
Instr 2
Instr 3
Instr 4
I$ Reg D$ Reg
ALU
I$ Reg D$ Reg
ALU
I$ Reg D$ Reg
ALU
Reg D$ Reg
ALU
I$ Reg D$ Reg
ALU
Solutions R/W:
*On same clock cycle
On different R/W ports
10. I. ILP: Data Hazards
Time (clock cycles)
I$
Instr 1
Instr 2
Instr 3
Instr 4
Instr 5
I$ Reg D$ Reg
ALU
I$ Reg D$ Reg
ALU
I$ Reg D$ Reg
ALU
Reg D$ Reg
ALU
I$ Reg D$ Reg
ALU
11. I. ILP: Data Hazards
add $t0, $t1, $t2
sub $t4, $t0 ,$t3
and $t5, $t0 ,$t6
or $t7, $t0 ,$t8
xor $t9, $t0 ,$t10
20. I. ILP: Control Hazards
Solutions:
Add HW to be able to compute branch on stage 2 (DECODE)
Predict Branch: To simplify hardware, predict branch as NOT TAKEN most of the times. End
of the loop will always be wrong, but then is just once
Insert instruction after branch, always gets executed. Compiler. MIPS
22. I. ILP: Optimizations
Instruction window: Trace of incoming instructions to analyze for execution.
Register Renaming: On false data dependences, hardware can rename the register.
Compilers should optimize this false dependences: R2R memory model.
Branch Prediction:
Static: Always not taken, always taken. Forward/Backward taken. Branch delay slot.
Dynamic: One-level (1bit, 2bit...), Two-level and Multiple Component
Jump Prediction: Static profiling. Dynamic: Last taken, 2bit tables, return stack.
Alias Analysis: Indirect memory references. Instruction Inspection.
23. I. ILP: Branch Prediction
Saturated counter: Increment on branch
taken, decrement on not taken. No
Over or Under flow.
Branch correlation: Inter/Intra
Two-level: Remembers the history of the
last n occurrences of the branch and
uses one saturating counter for each of
the possible 2n history patterns.
Many more...
24. CONTENT
I. ILP Background
II. Hardware Model
III. Study of Limitations
IV. Simultaneous Multithreading
V. ILP today
26. II. HW MODEL: Profiling Framework
A set of assumptions and a methodology to, experimentally, extract a parallelism
profile out of a set of benchmarks.
Program is executed completely, resulting in a trace of instructions.
Trace includes data addresses referenced, and the results of branches and jumps.
(D. Wall's 1993 study) Divides the trace in cycles of 64 instructions in flight.
The only limits on ILP in such a processor are those imposed by the actual data
flows through either registers or memory.
27. II. HW MODEL: Assumptions
• No limits on replicated functional units or ports to registers or memory.
• Register Renaming: Perfect, Infinite, Finite, None
• Branch Prediction: Perfect, Infinite, Finite, None
• Jump Prediction: Perfect, Infinite, Finite, None
• Memory Address Alias Analysis:
• Perfect Caches
• Unit cycle
• 2k Window size
28. II. HW MODEL: Register Renaming
• Perfect: Infinite number of registers to avoid false register dependencies.
• Finite: Normally 256 integer registers and 256 floating point registers used in LRU
(Least Recently Used) fashion.
• No renaming: Number of registers used in the code.
29. II. HW MODEL: Branch Prediction
• Perfect: All branches are correctly predicted.
• 2bit predictor with infinite tables: Dynamic. A 2bit counter per branch option (2).
Indexed by low-order bits of branch's address. Incremented on branch taken. Does not
overflow. Branch is taken if table entry is 2 or 3. Up to 512 2bit entries.
• 2bit predictor with infinite tables: Infinite number of counters.
• Tournament-based branch predictor: 2 2bit counters competing. A 2bit selector that is
decremented/incremented according to the correct prediction of the table entries.
• Profile based: Static predictions.
• No prediction: Every branch is predicted wrong.
Not in order of performance
30. II. HW MODEL: Jump Prediction
• Direct Jumps are known.
• Indirect jumps
– Perfect: Always performed correctly.
– Finite prediction: A table with destination addresses. The address of a jump
provides the index of the table. Whenever a jump is executed, we put its address in
the table. Next jump should be to address in the table.
– Infinite prediction: Infinite table entries.
• No prediction: Every jump is predicted wrong.
31. II. HW MODEL: Alias Analysis
• If two memory references do not refer to the same address, then they may
be safely interchanged.
• Indirect memory references are previous to the instruction execution.
• No need to predict the actual values, only whether those values conflict.
• Perfect: All global and stack reference predictions are perfect, heap
• Inspection: Examine base and offset
• None: All indirect memory references conflict.
32. II. HW MODEL: Window Size
• The set of instructions which is examined for simultaneous execution.
• The cycle width limits the number of instructions which can be scheduled.
• A window size of 2k will look at 2048 instructions.
• Cycle width: Assume we have found 111 instructions which can be parallelized. A
cycle width of 64 would limit actual parallelism to 64 in flight instructions.
33. II. HW MODEL
ctr: counter
gsh: gshared (global history)
34. CONTENT
I. ILP Background
II. Hardware Model
III. Study of Limitations
IV. Simultaneous Multithreading
V. ILP today
35. III. STUDY OF LIMITATIONS
• Effects of...
– Register Renaming
– Branch/Jump Prediction
– Alias Analysis
– Realizable processor
• Window Size (Discrete/Continuous)
• Results
42. III. LIMITATIONS: Realizable Processor
• Up to 64 instruction issues per clock with no issue restrictions, or roughly 10 times the
total issue width of the widest processor in 2011
• A tournament predictor with 1K entries and a 16-entry return predictor. This predictor
is comparable to the best predictors in 2011; the predictor is not a primary bottleneck
• Perfect disambiguation of memory references done dynamically—this is ambitious but
perhaps attainable for small window sizes (and hence small issue rates and load-store
buffers) or through address aliasing prediction
• Register renaming with 64 additional integer and 64 additional FP registers, which is
slightly less than the most aggressive processor in 2011
• No issue restrictions, no cache misses, unit latencies
• Variable Window Size (Power5 200, Intel Core i7 ~128)
44. III. LIMITATIONS: Conclusions
• Plateau behavior
• Window size effect on integer programs (3 top) is
not as severe. Due to loop-level parallelism.
• Designers are faced with the challenge:
– Simpler processors with larger caches and
higher clock rates
Vs
– ILP with slower clock and smaller caches
• Persistent limitations:
– WAW and WAR hazards through memory
– Unnecessary dependences
– Data flow limit
45. CONTENT
I. ILP Background
II. Hardware Model
III. Study of Limitations
IV. Simultaneous Multithreading
V. ILP today
46. IV. SIMULTANEOUS MULTITHREADING
• TLP Background
– TLP approaches
– Design Challenges
• Limits of Multiple-Issue Processors
– Power
– Complexity
47. IV. SMT: TLP Background
• Largely independent
– Separate copy of regFile, PC and page table
• Thread could represent
– A process that is part of a parallel program consisting of multiple processes
– An independent program on its own
• Thread level parallelism occurs naturally
• It can be used to employ the functional units idle when ILP is insufficient
49. IV. SMT: Changes
• Increasing the associativity of the L1 instruction cache and the instruction
address translation buffers
• Adding per-thread load and store queues
• Increasing the size of the L2 and L3 caches
• Adding separate instruction prefetch and buffering
• Increasing the number of virtual registers from 152 to 240
• Increasing the size of several issue queues
51. IV. SMT: Results
• SMT reduces energy by 7%
• “Because of the costs and diminishing returns in performance, however, rather than
implement wider superscalars and more aggressive versions of SMT, many designers are
opting to implement multiple CPU cores on a single die with slightly less aggressive support
for multiple issue and multithreading; we return to this topic in the next chapter.” - Hennessy
et al.
52. CONTENT
I. ILP Background
II. Hardware Model
III. Study of Limitations
IV. Simultaneous Multithreading
V. ILP today
53. V. ILP TODAY: x86
• Instruction fetch—The processor
uses a multilevel branch target buffer
to achieve a balance between speed
and prediction accuracy. There is also a
return address stack to speed up
function return. Mispredictions cause a
penalty of about 17 cycles. Using the
predicted address, the instruction fetch
unit fetches 16 bytes from the
instruction cache.
• Micro-code and Macro-code
• Total pipeline depth is 14 stages
• 128 reorder (renaming) buffer size
54. V. ILP TODAY: x86
• Hyper-Threading:
– SMT
– The processor may stall due to a
cache miss, branch misprediction,
or data dependency.
– Branch misprediction costs 17
cycles
56. V. ILP TODAY: ARM
- The average CPI for the ARM7 family is about 1.9 cycles per instruction.
- The average CPI for the ARM9 family is about 1.5 cycles per instruction.
- The average CPI for the ARM11 family is about 1.39 cycles per instruction.
57. SOURCES
Computer Architecture: A Quantitative Approach. Hennessy, J.L., Patterson, D.A., Asanović, K..
5th Ed. 2012. Morgan Kaufmann/Elsevier.
Limits of instruction-level parallelism. D. W. Wall. IV international conference on Architectural
Support for Programming Languages and Operating Systems (ASPLOS), pages 176–188, 1991.
Computer Science 61C - Lecture 31: Instruction Level Parallelism. Mike Franklin, Dan Garcia.
UC Berkeley. Fall. 2011
ILP and TLP in Shared Memory Applications: A Limit Study. E. Fatehi, P. V. Gratz, Proceedings of
the 23rd international conference on Parallel architectures and compilation, pages 113-126, 2014.
MIPS Multicycle Model: Pipelining. Michael Langer. Introduction to Computer Systems. McGill
University. 2012.
IBM Power5 Chip: A Dual-Core Multithreaded Processor. R. Kalla, B. Sinharoy, J. M. Tendler. IBM.
IEEE CS. 2004.