The document discusses pipeline hazards in computer architecture and their resolution mechanisms. There are three types of hazards: structural hazards which occur due to resource conflicts, data hazards which occur when an instruction depends on data from a prior instruction, and control hazards which occur due to conditional branch instructions. Data hazards include read after write (RAW), write after read (WAR), and write after write (WAW) hazards. Forwarding is a common technique to resolve data hazards by passing results directly from one pipeline stage to another as needed. Stalls can also resolve hazards by inserting no-operation instructions but reduce pipeline efficiency.
Model checking is a formal verification technique that employs a model of a system to check that the system meets its desired specifications. It works by systematically checking all possible states of a modeled system to verify its behavior. Model checking is effective at finding potential design errors early in the development process and is widely used to verify hardware, software, security protocols, and safety-critical systems like transportation systems. An important part of model checking involves building a transition system model of a system's states and the transitions between those states labeled with actions and atomic propositions.
Formal Specification in Software Engineering SE9koolkampus
This document discusses formal specification techniques for software. It describes algebraic techniques for specifying interfaces as abstract data types and model-based techniques for specifying system behavior. Algebraic specifications define operations and their relationships, while model-based specifications represent the system state using mathematical constructs like sets and sequences. Formal specification finds errors earlier and reduces rework, though it requires more upfront effort. The document also provides an example of formally specifying an air traffic control system and insulin pump.
This document discusses the nature of software. It defines software as a set of instructions that can be stored electronically. Software engineering encompasses processes and methods to build high quality computer software. Software has a dual role as both a product and a vehicle to deliver products. Characteristics of software include being engineered rather than manufactured, and not wearing out over time like hardware. Software application domains include system software, application software, engineering/scientific software, embedded software, product-line software, web applications, and artificial intelligence software. The document also discusses challenges like open-world computing and legacy software.
Formal verification refers to mathematical techniques for specifying, designing, and verifying software and hardware systems. It involves proving or disproving the correctness of algorithms in a system with respect to a formal specification or property using formal methods of mathematics. Formal verification techniques include manual proofs, semi-automatic theorem proving, and automatic algorithms that take a model and property to determine if the model satisfies the property. Formal verification is commonly used for safety-critical systems like embedded systems to help ensure correctness. Tools like VC formal, VC LP, and Spyglass can be used to formally verify designs early in development without complex test benches or stimulus.
Software re-engineering is a process of examining and altering a software system to restructure it and improve maintainability. It involves sub-processes like reverse engineering, redocumentation, and data re-engineering. Software re-engineering is applicable when some subsystems require frequent maintenance and can be a cost-effective way to evolve legacy software systems. The key advantages are reduced risk compared to new development and lower costs than replacing the system entirely.
Discussed different types of dynamic interconnection networks. Graphically demonstrated single and multiple bus interconnection networks. Discussed different types of switch based interconnection networks. Graphically shown the mechanisms of crossbar, single and multistage interconnection networks. Graphically explained the working principle of omega network, Benes network, and baseline networks.
The COCOMO model is a widely used software cost estimation model that predicts development effort and schedule based on project attributes. It includes basic, intermediate, and detailed models of increasing complexity. The intermediate model estimates effort as a function of source lines of code and cost drivers. The detailed model further incorporates the impact of cost drivers on development phases. COCOMO 2 expands on this with application composition, early design, reuse, and post-architecture models for different project stages.
Model checking is a formal verification technique that employs a model of a system to check that the system meets its desired specifications. It works by systematically checking all possible states of a modeled system to verify its behavior. Model checking is effective at finding potential design errors early in the development process and is widely used to verify hardware, software, security protocols, and safety-critical systems like transportation systems. An important part of model checking involves building a transition system model of a system's states and the transitions between those states labeled with actions and atomic propositions.
Formal Specification in Software Engineering SE9koolkampus
This document discusses formal specification techniques for software. It describes algebraic techniques for specifying interfaces as abstract data types and model-based techniques for specifying system behavior. Algebraic specifications define operations and their relationships, while model-based specifications represent the system state using mathematical constructs like sets and sequences. Formal specification finds errors earlier and reduces rework, though it requires more upfront effort. The document also provides an example of formally specifying an air traffic control system and insulin pump.
This document discusses the nature of software. It defines software as a set of instructions that can be stored electronically. Software engineering encompasses processes and methods to build high quality computer software. Software has a dual role as both a product and a vehicle to deliver products. Characteristics of software include being engineered rather than manufactured, and not wearing out over time like hardware. Software application domains include system software, application software, engineering/scientific software, embedded software, product-line software, web applications, and artificial intelligence software. The document also discusses challenges like open-world computing and legacy software.
Formal verification refers to mathematical techniques for specifying, designing, and verifying software and hardware systems. It involves proving or disproving the correctness of algorithms in a system with respect to a formal specification or property using formal methods of mathematics. Formal verification techniques include manual proofs, semi-automatic theorem proving, and automatic algorithms that take a model and property to determine if the model satisfies the property. Formal verification is commonly used for safety-critical systems like embedded systems to help ensure correctness. Tools like VC formal, VC LP, and Spyglass can be used to formally verify designs early in development without complex test benches or stimulus.
Software re-engineering is a process of examining and altering a software system to restructure it and improve maintainability. It involves sub-processes like reverse engineering, redocumentation, and data re-engineering. Software re-engineering is applicable when some subsystems require frequent maintenance and can be a cost-effective way to evolve legacy software systems. The key advantages are reduced risk compared to new development and lower costs than replacing the system entirely.
Discussed different types of dynamic interconnection networks. Graphically demonstrated single and multiple bus interconnection networks. Discussed different types of switch based interconnection networks. Graphically shown the mechanisms of crossbar, single and multistage interconnection networks. Graphically explained the working principle of omega network, Benes network, and baseline networks.
The COCOMO model is a widely used software cost estimation model that predicts development effort and schedule based on project attributes. It includes basic, intermediate, and detailed models of increasing complexity. The intermediate model estimates effort as a function of source lines of code and cost drivers. The detailed model further incorporates the impact of cost drivers on development phases. COCOMO 2 expands on this with application composition, early design, reuse, and post-architecture models for different project stages.
This document outlines a presentation on pipelining and data hazards in microprocessors. It begins with rules for participant questions and outlines the topics to be covered: what is pipelining, types of pipelining, data hazards and their types, and solutions to data hazards. It then defines pipelining as executing subsequent instructions before prior ones complete. Types of pipelining include control, data, and structure hazards. Data hazards occur if an instruction uses a value before it is ready, and their types are RAW, WAR, and WAW. Solutions involve forwarding newer register values to bypass stale values in the pipeline and prevent hazards.
Multithreading allows exploiting thread-level parallelism (TLP) to improve processor utilization. There are several categories of multithreading:
- Superscalar simultaneous multithreading interleaves instructions from multiple threads within a single out-of-order processor core to reduce idle resources.
- Coarse-grained multithreading switches between threads on long-latency events like cache misses to hide latency.
- Fine-grained multithreading interleaves threads at a finer instruction granularity in in-order cores.
- Multiprocessing physically separates threads onto multiple processor cores.
Software engineering a practitioners approach 8th edition pressman solutions ...Drusilla918
Full clear download( no error formatting) at: https://goo.gl/XmRyGP
software engineering a practitioner's approach 8th edition pdf free download
software engineering a practitioner's approach 8th edition ppt
software engineering a practitioner's approach 6th edition pdf
software engineering pressman 9th edition pdf
software engineering a practitioner's approach 9th edition
software engineering a practitioner's approach 9th edition pdf
software engineering a practitioner's approach 7th edition solution manual pdf
roger s. pressman
Process models describe the life cycle of software development from requirements gathering to maintenance. The main process models discussed are waterfall, incremental, RAD, prototype, spiral and concurrent development. Each model represents the phases and flow of activities in the software development process in a different way. Process models help develop software in a systematic manner and ensure all team members understand responsibilities and timelines.
An introduction to software engineering, based on the first chapter of "A (Partial) Introduction to Software Engineering
Practices and Methods" By Laurie Williams
A structure chart is a top-down diagram that shows the breakdown of a system into manageable sub-modules. It represents each module as a box with lines connecting them to show relationships. Structure charts are used in software engineering to plan program structure and divide a problem into smaller tasks. They provide a hierarchical visualization of how a program or system is decomposed.
this presentation is about planning process in AI. The presentation specifically explained POP(Partial order Planning). There are also another planning. In this presentation with help of an example the presentation is briefly explained the planning is done in AI
Software Engineering (Introduction to Software Engineering)ShudipPal
Software engineering is concerned with all aspects of software production. It aims to develop software using systematic and disciplined approaches to reduce errors and costs. Some key challenges in software development are its high cost, difficulty delivering on time, and producing low quality software. Software engineering methods strive to address these challenges and produce software with attributes like maintainability, dependability, efficiency, usability and acceptability.
Evolutionary models are iterative and incremental software development approaches that combine iterative and incremental processes. There are two main types: prototyping and spiral models. The prototyping model develops prototypes that are tested and refined based on customer feedback until requirements are met, while the spiral model proceeds through multiple loops or phases of planning, risk analysis, engineering, and evaluation. Both approaches allow requirements to evolve through development and support risk handling.
Round robin scheduling assigns CPU bursts to processes in time quantums of 4 time units. It was analyzed that with 3 processes having arrival times of 0 and burst times of 24, 3, and 3 time units respectively, the average turnaround time was 15.66 time units and the average waiting time was 5.666 time units.
Black-box testing is a method of software testing that examines the functionality of an application based on the specifications.
White box testing is a testing technique, that examines the program structure and derives test data from the program logic/code
The document discusses software reuse and component-based software engineering. It describes how reuse can happen at different levels from full application systems down to individual functions. Reusing code, specifications, and designs can improve reliability, reduce risks and costs, and speed up development time. However, reuse requires an organized component library, confidence in component quality, and documentation to support adaptation. The document also outlines processes for incorporating reuse into development and enhancing the reusability of components.
The document discusses Windows XP's scheduling algorithm. It uses a priority-based, preemptive approach with 32 priority levels divided into variable and real-time classes. The scheduler ensures the highest priority thread runs by maintaining queues for each priority level and traversing from highest to lowest. Threads start at the process' base priority and may have their priority lowered after time quantums expire to limit CPU consumption of compute-intensive threads.
The document discusses different threading models used in operating systems including many-to-one, one-to-one, and many-to-many. It also covers POSIX threads (pthreads) which provide a standard API for multithreaded programming, and describes functions for thread creation, management, synchronization and more.
This document discusses real-time systems and provides an overview of key concepts in two sessions. The first session defines real-time systems, distinguishes between hard and soft real-time categories, and covers design approaches like super-loops and multitasking. The second session defines real-time operating systems, discusses common commercial and free RTOSs, and explains concepts like tasks, kernels, schedulers, preemptive vs. non-preemptive scheduling, and inter-task communication methods.
The document discusses pipeline hazards in computer processors and techniques to resolve them. There are three types of hazards: structural hazards which occur when multiple instructions require the same hardware resource simultaneously; data hazards which happen when an instruction depends on data from a prior instruction still in the pipeline; and control hazards which involve conditional branch instructions. Data hazards are further classified as read-after-write, write-after-read, or write-after-write depending on the specific dependency. Common resolution techniques include forwarding operands between pipeline stages, stalling the pipeline by inserting bubbles, and reordering instructions.
RAR (Read After Read) is not considered a data hazard because it does not change the order of memory accesses or introduce incorrect results. Multiple instructions can safely read the same register without interfering with each other. The three types of data hazards that can occur are RAW (Read After Write), WAR (Write After Read), and WAW (Write After Write) which all involve write operations that could potentially overwrite data before it is read.
This document outlines a presentation on pipelining and data hazards in microprocessors. It begins with rules for participant questions and outlines the topics to be covered: what is pipelining, types of pipelining, data hazards and their types, and solutions to data hazards. It then defines pipelining as executing subsequent instructions before prior ones complete. Types of pipelining include control, data, and structure hazards. Data hazards occur if an instruction uses a value before it is ready, and their types are RAW, WAR, and WAW. Solutions involve forwarding newer register values to bypass stale values in the pipeline and prevent hazards.
Multithreading allows exploiting thread-level parallelism (TLP) to improve processor utilization. There are several categories of multithreading:
- Superscalar simultaneous multithreading interleaves instructions from multiple threads within a single out-of-order processor core to reduce idle resources.
- Coarse-grained multithreading switches between threads on long-latency events like cache misses to hide latency.
- Fine-grained multithreading interleaves threads at a finer instruction granularity in in-order cores.
- Multiprocessing physically separates threads onto multiple processor cores.
Software engineering a practitioners approach 8th edition pressman solutions ...Drusilla918
Full clear download( no error formatting) at: https://goo.gl/XmRyGP
software engineering a practitioner's approach 8th edition pdf free download
software engineering a practitioner's approach 8th edition ppt
software engineering a practitioner's approach 6th edition pdf
software engineering pressman 9th edition pdf
software engineering a practitioner's approach 9th edition
software engineering a practitioner's approach 9th edition pdf
software engineering a practitioner's approach 7th edition solution manual pdf
roger s. pressman
Process models describe the life cycle of software development from requirements gathering to maintenance. The main process models discussed are waterfall, incremental, RAD, prototype, spiral and concurrent development. Each model represents the phases and flow of activities in the software development process in a different way. Process models help develop software in a systematic manner and ensure all team members understand responsibilities and timelines.
An introduction to software engineering, based on the first chapter of "A (Partial) Introduction to Software Engineering
Practices and Methods" By Laurie Williams
A structure chart is a top-down diagram that shows the breakdown of a system into manageable sub-modules. It represents each module as a box with lines connecting them to show relationships. Structure charts are used in software engineering to plan program structure and divide a problem into smaller tasks. They provide a hierarchical visualization of how a program or system is decomposed.
this presentation is about planning process in AI. The presentation specifically explained POP(Partial order Planning). There are also another planning. In this presentation with help of an example the presentation is briefly explained the planning is done in AI
Software Engineering (Introduction to Software Engineering)ShudipPal
Software engineering is concerned with all aspects of software production. It aims to develop software using systematic and disciplined approaches to reduce errors and costs. Some key challenges in software development are its high cost, difficulty delivering on time, and producing low quality software. Software engineering methods strive to address these challenges and produce software with attributes like maintainability, dependability, efficiency, usability and acceptability.
Evolutionary models are iterative and incremental software development approaches that combine iterative and incremental processes. There are two main types: prototyping and spiral models. The prototyping model develops prototypes that are tested and refined based on customer feedback until requirements are met, while the spiral model proceeds through multiple loops or phases of planning, risk analysis, engineering, and evaluation. Both approaches allow requirements to evolve through development and support risk handling.
Round robin scheduling assigns CPU bursts to processes in time quantums of 4 time units. It was analyzed that with 3 processes having arrival times of 0 and burst times of 24, 3, and 3 time units respectively, the average turnaround time was 15.66 time units and the average waiting time was 5.666 time units.
Black-box testing is a method of software testing that examines the functionality of an application based on the specifications.
White box testing is a testing technique, that examines the program structure and derives test data from the program logic/code
The document discusses software reuse and component-based software engineering. It describes how reuse can happen at different levels from full application systems down to individual functions. Reusing code, specifications, and designs can improve reliability, reduce risks and costs, and speed up development time. However, reuse requires an organized component library, confidence in component quality, and documentation to support adaptation. The document also outlines processes for incorporating reuse into development and enhancing the reusability of components.
The document discusses Windows XP's scheduling algorithm. It uses a priority-based, preemptive approach with 32 priority levels divided into variable and real-time classes. The scheduler ensures the highest priority thread runs by maintaining queues for each priority level and traversing from highest to lowest. Threads start at the process' base priority and may have their priority lowered after time quantums expire to limit CPU consumption of compute-intensive threads.
The document discusses different threading models used in operating systems including many-to-one, one-to-one, and many-to-many. It also covers POSIX threads (pthreads) which provide a standard API for multithreaded programming, and describes functions for thread creation, management, synchronization and more.
This document discusses real-time systems and provides an overview of key concepts in two sessions. The first session defines real-time systems, distinguishes between hard and soft real-time categories, and covers design approaches like super-loops and multitasking. The second session defines real-time operating systems, discusses common commercial and free RTOSs, and explains concepts like tasks, kernels, schedulers, preemptive vs. non-preemptive scheduling, and inter-task communication methods.
The document discusses pipeline hazards in computer processors and techniques to resolve them. There are three types of hazards: structural hazards which occur when multiple instructions require the same hardware resource simultaneously; data hazards which happen when an instruction depends on data from a prior instruction still in the pipeline; and control hazards which involve conditional branch instructions. Data hazards are further classified as read-after-write, write-after-read, or write-after-write depending on the specific dependency. Common resolution techniques include forwarding operands between pipeline stages, stalling the pipeline by inserting bubbles, and reordering instructions.
RAR (Read After Read) is not considered a data hazard because it does not change the order of memory accesses or introduce incorrect results. Multiple instructions can safely read the same register without interfering with each other. The three types of data hazards that can occur are RAW (Read After Write), WAR (Write After Read), and WAW (Write After Write) which all involve write operations that could potentially overwrite data before it is read.
There are situations, called hazards, that prevent the next instruction in the instruction stream from executing during its designated cycle
There are three classes of hazards
Structural hazard
Data hazard
Branch hazard
CPU Pipelining and Hazards - An IntroductionDilum Bandara
Pipelining is a technique used in computer architecture to overlap the execution of instructions to increase throughput. It works by breaking down instruction execution into a series of steps and allowing subsequent instructions to begin execution before previous ones complete. This allows multiple instructions to be in various stages of completion simultaneously. Pipelining improves performance but introduces hazards such as structural, data, and control hazards that can reduce the ideal speedup if not addressed properly. Control hazards due to branches are particularly challenging to handle efficiently.
The document discusses pipeline hazards including structural, data, and control hazards. It provides details on how each hazard can occur in a 5-stage pipeline and techniques to resolve them, including forwarding, stalling, and compiler scheduling. Data hazards are classified as RAW, WAW, and WAR. Control hazards from branches are reduced by computing the branch target and outcome earlier in the ID phase to minimize stalls.
The document summarizes different types of pipeline hazards that can occur in a processor pipeline: structural hazards which occur due to limited hardware resources and prevent certain combinations of instructions from executing simultaneously; data hazards which occur when instructions depend on results of previous instructions in a way exposed by pipelining; and control hazards which occur due to pipelining of branches whose target may not be known until later in the pipeline. It describes techniques for handling these hazards such as forwarding, stalling, and instruction scheduling to minimize performance impacts.
This document appears to be slides from a lecture on pipelining. It discusses various types of pipeline hazards including structural hazards where the same hardware is needed for multiple instructions, data hazards where an instruction depends on the result of a prior instruction, and control hazards caused by delays in determining branch outcomes. It provides examples of each hazard type and describes techniques for resolving the hazards such as stalling the pipeline, forwarding results between stages, and predicting branch outcomes earlier. The document emphasizes that pipelining can improve processor throughput but hazards may reduce its effectiveness if not addressed properly.
Pipelining is a technique used in microprocessors to overlap the execution of multiple instructions by dividing instruction execution into discrete stages. It allows the next instruction to begin executing before the previous one has finished. The pipeline is divided into segments that perform discrete operations concurrently. This improves processor throughput by allowing new instructions to enter the pipeline every clock cycle.
Computer Architecture and Organizationssuserdfc773
The document discusses instruction level parallelism (ILP) techniques used in computer processors to improve performance. It describes various ILP constraints like data dependencies and hazards that limit parallel execution. Key ILP techniques discussed are instruction pipelining, superscalar execution, out-of-order execution, register renaming, speculative execution, and branch prediction. The document also covers compiler techniques for ILP like loop unrolling and how branch prediction tables work to predict branch directions with over 80% accuracy.
This document discusses parallel processing techniques such as pipelining and vector processing to increase computational speed. It covers Flynn's classification of computer architectures, arithmetic pipelining using a floating-point adder as an example, instruction pipelining with a four-segment model, resolving data dependencies and branch difficulties in pipelines, and RISC pipeline examples addressing delayed load and branch issues. The key techniques discussed are decomposing operations into parallel suboperations, hardware interlocks, operand forwarding, and compiler assistance.
I am Frank Allen. I am a Computer Architecture Assignment Expert at architectureassignmenthelp.com. I hold a Master's in Computer Architecture from, Ontario Tech University, Canada. I have been helping students with their assignments for the past 10 years. I solve assignments related to Computer Architecture.
Visit architectureassignmenthelp.com or email info@architectureassignmenthelp.com. You can also call on +1 678 648 4277 for any assistance with Computer Architecture Assignments.
Microcontroladores: introducción a la programación en lenguaje ensamblador AVRSANTIAGO PABLO ALBERTO
The document discusses branching and conditional control transfer instructions in AVR assembly language. It explains how unconditional jump instructions like jmp and call directly set the program counter to a target address. Relative jump instructions like rjmp and rcall add a signed offset to the program counter. Conditional branch instructions like breq test status register flag bits and conditionally set the program counter by adding a signed offset if the test condition is true. The document provides examples of how these instructions work at the machine level to transfer program control flow.
Pipelining understandingPipelining is running multiple stages of .pdfarasanlethers
Pipelining understanding:
Pipelining is running multiple stages of the same process in parallel in a way that efficiently uses
all the available hardware while respecting the dependencies of each stage upon the previous
stages. In the laundry example, the stages are washing, drying, and folding. By starting a wash
stage as soon as the previous wash stage is moved to the dryer, the idle time of the washer is
minimized. Notice that the wash stage takes less time than the dry stage, so the wash stage must
remain idle until the dry stage finishes: the steady state throughput of the pipeline is limited by
the slowest stage in the pipeline. This can be mitigated by breaking up the bottleneck stage into
smaller sub-stages. For those less concerned with laundry-based examples, consider a video
game. The CPU computes the keyboard/mouse input each frame and moves the camera
accordingly, then the GPU takes that information and actually renders the scene; meanwhile, the
CPU has already begun calculating what\'s going to happen in the next frame.
How Pipelining will done:
In class, we mentioned that interpreting each computer instruction is a four step process: fetching
the instruction, decoding it and reading the register, executing it, and recording the results. Each
instruction may take 4 cycles to complete, but if our throughput is one instruction each cycle,
then we would like to perform, on average, $n$ instructions every $n$ cycles. To accomplish
this, we can split up an instruction\'s work into the 4 different steps so that other pieces of
hardware work to decode, execute, and record results while the CPU performs the fetch. The
latency to process each instruction is fixed at 4 cycles, so by processing a new instruction every
cycle, after four cycles, one instruction has been completed and three are \"in progress\" (they\'re
in the pipeline). After many cycles the steady state throughput approaches one completed
instruction every cycle.
An assembly line in a auto manufacturing plant is another good example of a pipelined process.
There are many steps in the assembly of the car, each of which is assigned a stage in the pipeline.
Typically the depth of these pipelines is very large: cars are pretty complex, so there need to be a
lot of stages in the assembly line. The more stages, the longer it takes to crank the system up to a
steady state. The larger the depth, the more costly it is to turn the system around: A branch
misprediction in an instruction pipeline would be like getting one of the steps wrong in the
assembly line: all the cars affected would have to go back to the beginning of the assembly line
and be processed again.
OnLive Example[Realtime]:
OnLive is a company that allows gamers to play video games in the cloud. The games are run on
one of the company\'s server farms, and video of the game is sent back to your computer. The
idea is that even the lamest of computers can run the most highly intensive games because all the
computer does .
Unlock full featured course with 250+ Video Lectures at 20% Discount for "Learn 5 PLC's in a Day" lifetime E-Learning course for 39 USD only: https://www.udemy.com/nfi-plc-online-leaning/?couponCode=slideshare2016
Enroll for Advanced Industrial Automation Training with PLC, HMI and Drive Combo with 300+ Video Lecture for 69.3 USD only: http://online.nfiautomation.org/catalog/1769?couponCode=LEARNING_MADE_EASY
This document discusses different types of hazards that can occur in a pipelined processor and methods to avoid them. It defines a hazard as any phenomenon that can disrupt or damage the pipeline. The three main types of hazards discussed are structural hazards, data hazards, and control hazards. Structural hazards occur due to resource conflicts in the pipeline. Data hazards include read after write (RAW), write after read (WAR), and write after write (WAW) hazards caused by instructions dependent on each others' data. Control hazards are caused by instructions that change the program counter. The document proposes solutions like renaming, register renaming, and branch prediction to avoid these hazards and minimize stalls in the pipeline.
Similar to High Performance Computer Architecture (20)
This document discusses file systems and their implementation. It covers key concepts like file attributes, operations, access methods, directory structures, allocation methods, and more. Specifically:
- File attributes include the name, identifier, type, location, size, and protection attributes to control access and monitor usage.
- Common file operations are create, write, read, reposition, delete, truncate, open, and close.
- Access methods can be sequential or direct, with sequential only allowing reading/writing in order and direct allowing random access.
- Directory structures organize files in a tree structure with nodes containing metadata to locate files on disk.
- Allocation methods like contiguous, linked, and indexed are
This document discusses virtual memory and demand paging. It begins by explaining that virtual memory separates logical memory from physical memory, allowing for address spaces larger than physical memory. It then describes demand paging, where pages are only brought into memory when needed, reducing I/O and memory usage. The document also covers page faults, valid-invalid bits, page tables, and techniques for handling page faults like page replacement when no free frames are available. It analyzes algorithms like FIFO, optimal, and LRU for page replacement and their performance based on reference strings.
This document discusses disk storage systems and disk scheduling algorithms. It describes how disks are organized into logical blocks mapped to physical sectors. The operating system is responsible for scheduling disk requests to minimize seek time and maximize bandwidth. Common disk scheduling algorithms described include first-come, first-served (FCFS); shortest seek time first (SSTF); SCAN; circular SCAN (C-SCAN); and circular look (C-LOOK). SSTF and SCAN tend to perform best in different situations depending on request patterns, and the algorithm can be changed if needed.
Main memory is where programs are loaded to run on the CPU. There are several techniques for managing memory allocation and binding programs to addresses in memory, including compile-time, load-time, and execution-time binding. Memory management is needed to map logical addresses used by programs to physical addresses in memory. Paging is a memory management technique that divides memory into pages to allow non-contiguous allocation and reduce fragmentation.
This document discusses deadlocks in computer systems. It defines a deadlock as a set of blocked processes where each process is holding a resource and waiting for a resource held by another process in the set, resulting in circular waiting. It presents examples of deadlock situations and describes the conditions required for deadlock, including mutual exclusion, hold and wait, no preemption, and circular wait. Methods for handling deadlocks include prevention, avoidance, and detection and recovery. Prevention ensures deadlocks never occur through restrictions, while avoidance uses online algorithms to ensure the system remains in a safe state where deadlocks cannot arise.
The document discusses various techniques for process synchronization and solving the critical section problem in concurrent systems. It describes producer-consumer problems and solutions using shared memory. It covers issues like race conditions that can occur. Different algorithms for solving the critical section problem are presented, including Peterson's algorithm and the Bakery algorithm. The document also discusses synchronization hardware support and low-level synchronization tools like locks, test-and-set instructions, and semaphores.
This chapter discusses processes and process management in operating systems. A process is a program in execution that requires resources like CPU, memory, and I/O. Processes go through various states like running, waiting, ready, and terminated. The operating system is responsible for creating, scheduling, and terminating processes. Key components involved in process management include the process control block (PCB) that stores process information, scheduling queues like the ready queue and I/O queues, and schedulers that select the next process to execute. When the CPU switches between processes, a context switch saves the state of one process and loads the state for the next process.
This document provides an introduction and overview of operating systems. It defines an operating system as a program that acts as an intermediary between the user and computer hardware to effectively utilize system resources and make problem solving easier. A computer system consists of hardware, operating system, application programs, and users. The operating system coordinates access to resources, executes programs, handles input/output, manages files and directories, and provides protection and security for multi-user systems. It discusses the evolution of operating systems and provides examples of early systems like batch processing and timesharing systems.
The document describes the basic processing unit of a computer. It discusses the single bus organization of a processor's data path. It explains the operational steps to transfer data between registers and perform arithmetic/logic operations using an ALU. It also describes how instructions are fetched from memory and executed step-by-step, including fetching operands, performing operations, and storing results. Branch instructions require updating the program counter register with the target address.
The document discusses input-output organization and describes various input and output peripheral devices. It then covers input-output interfaces, how data is transferred asynchronously between devices and the CPU using handshaking, and different modes of data transfer including program-controlled, interrupt-initiated, and direct memory access. It also discusses how interrupts from multiple devices are handled using priority levels and interrupt vectors.
Memory is organized in a hierarchy to balance speed, size, and cost. This includes registers, cache, main memory, and secondary storage. The memory hierarchy places faster but smaller and more expensive memory closer to the CPU. Main memory uses DRAM chips that are organized into rows and columns. Cache memory uses SRAM and exploits locality to provide faster access than main memory.
This document discusses different aspects of central processing unit (CPU) organization and instruction set architecture. It describes three main types of CPU organization: single register, general register, and stack. It also discusses different instruction formats including one, two, three-address formats. Additionally, it covers addressing modes such as register, direct, indirect, immediate, implied and their use in accessing operands. Finally, it briefly discusses components of a CPU like registers, ALU, control unit and their role in executing instructions.
This document provides an introduction to high performance computer architecture and multiprocessors. It discusses how initial improvements in computer performance came from innovative manufacturing techniques and exploitation of instruction level parallelism (ILP). More recently, exploiting thread and process level parallelism across multiple processors has become a focus. The key types of multiprocessor architectures discussed are symmetric multiprocessors (SMPs) and distributed memory computers which use message passing. SMPs connect multiple processors to a shared memory using a bus, while distributed memory computers require explicit message passing between separate processor memories.
This document discusses memory hierarchy design and optimizations. It covers cache performance parameters like miss rate and miss penalty. It examines questions around block placement, identification, and replacement in caches. Write policies like write-back and write-through are explained. Examples calculate average memory access time and the impact of caches on CPU performance. Techniques to improve cache performance through reducing miss rates, miss penalties, and hit times are outlined.
The document provides an introduction to computer architecture and performance. It discusses how performance has increased due to Moore's Law and transistor density doubling every 18 months. It explains that improvements in recent decades have come more from architectural innovations like pipelining than just manufacturing advances. Pipelining allows multiple instructions to be processed simultaneously to improve performance. The document outlines some objectives of studying innovations like RISC architectures, pipelining, caching and parallel processing.
This document discusses the basic organization and design of computers. It covers topics such as architecture versus organization, functional units like the arithmetic logic unit and control unit, instruction formats, processor registers, stored program concepts, basic operational concepts like loading and storing data, memory access, and factors that impact performance such as pipelining and instruction set design. The document provides an overview of fundamental computer hardware components and operations.
The document discusses motherboards and their role in connecting the core components of a computer system. A motherboard serves as the central platform to which the CPU, memory, storage drives, graphics/sound cards, and other ports and slots connect. It forms the backbone of a computer by allowing all parts to communicate with each other. The document also mentions motherboard compatibility and different types of motherboards but does not provide any details.
Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...shadow0702a
This document serves as a comprehensive step-by-step guide on how to effectively use PyCharm for remote debugging of the Windows Subsystem for Linux (WSL) on a local Windows machine. It meticulously outlines several critical steps in the process, starting with the crucial task of enabling permissions, followed by the installation and configuration of WSL.
The guide then proceeds to explain how to set up the SSH service within the WSL environment, an integral part of the process. Alongside this, it also provides detailed instructions on how to modify the inbound rules of the Windows firewall to facilitate the process, ensuring that there are no connectivity issues that could potentially hinder the debugging process.
The document further emphasizes on the importance of checking the connection between the Windows and WSL environments, providing instructions on how to ensure that the connection is optimal and ready for remote debugging.
It also offers an in-depth guide on how to configure the WSL interpreter and files within the PyCharm environment. This is essential for ensuring that the debugging process is set up correctly and that the program can be run effectively within the WSL terminal.
Additionally, the document provides guidance on how to set up breakpoints for debugging, a fundamental aspect of the debugging process which allows the developer to stop the execution of their code at certain points and inspect their program at those stages.
Finally, the document concludes by providing a link to a reference blog. This blog offers additional information and guidance on configuring the remote Python interpreter in PyCharm, providing the reader with a well-rounded understanding of the process.
artificial intelligence and data science contents.pptxGauravCar
What is artificial intelligence? Artificial intelligence is the ability of a computer or computer-controlled robot to perform tasks that are commonly associated with the intellectual processes characteristic of humans, such as the ability to reason.
› ...
Artificial intelligence (AI) | Definitio
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...IJECEIAES
Medical image analysis has witnessed significant advancements with deep learning techniques. In the domain of brain tumor segmentation, the ability to
precisely delineate tumor boundaries from magnetic resonance imaging (MRI)
scans holds profound implications for diagnosis. This study presents an ensemble convolutional neural network (CNN) with transfer learning, integrating
the state-of-the-art Deeplabv3+ architecture with the ResNet18 backbone. The
model is rigorously trained and evaluated, exhibiting remarkable performance
metrics, including an impressive global accuracy of 99.286%, a high-class accuracy of 82.191%, a mean intersection over union (IoU) of 79.900%, a weighted
IoU of 98.620%, and a Boundary F1 (BF) score of 83.303%. Notably, a detailed comparative analysis with existing methods showcases the superiority of
our proposed model. These findings underscore the model’s competence in precise brain tumor localization, underscoring its potential to revolutionize medical
image analysis and enhance healthcare outcomes. This research paves the way
for future exploration and optimization of advanced CNN models in medical
imaging, emphasizing addressing false positives and resource efficiency.
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...IJECEIAES
Climate change's impact on the planet forced the United Nations and governments to promote green energies and electric transportation. The deployments of photovoltaic (PV) and electric vehicle (EV) systems gained stronger momentum due to their numerous advantages over fossil fuel types. The advantages go beyond sustainability to reach financial support and stability. The work in this paper introduces the hybrid system between PV and EV to support industrial and commercial plants. This paper covers the theoretical framework of the proposed hybrid system including the required equation to complete the cost analysis when PV and EV are present. In addition, the proposed design diagram which sets the priorities and requirements of the system is presented. The proposed approach allows setup to advance their power stability, especially during power outages. The presented information supports researchers and plant owners to complete the necessary analysis while promoting the deployment of clean energy. The result of a case study that represents a dairy milk farmer supports the theoretical works and highlights its advanced benefits to existing plants. The short return on investment of the proposed approach supports the paper's novelty approach for the sustainable electrical system. In addition, the proposed system allows for an isolated power setup without the need for a transmission line which enhances the safety of the electrical network
Comparative analysis between traditional aquaponics and reconstructed aquapon...bijceesjournal
The aquaponic system of planting is a method that does not require soil usage. It is a method that only needs water, fish, lava rocks (a substitute for soil), and plants. Aquaponic systems are sustainable and environmentally friendly. Its use not only helps to plant in small spaces but also helps reduce artificial chemical use and minimizes excess water use, as aquaponics consumes 90% less water than soil-based gardening. The study applied a descriptive and experimental design to assess and compare conventional and reconstructed aquaponic methods for reproducing tomatoes. The researchers created an observation checklist to determine the significant factors of the study. The study aims to determine the significant difference between traditional aquaponics and reconstructed aquaponics systems propagating tomatoes in terms of height, weight, girth, and number of fruits. The reconstructed aquaponics system’s higher growth yield results in a much more nourished crop than the traditional aquaponics system. It is superior in its number of fruits, height, weight, and girth measurement. Moreover, the reconstructed aquaponics system is proven to eliminate all the hindrances present in the traditional aquaponics system, which are overcrowding of fish, algae growth, pest problems, contaminated water, and dead fish.
Introduction- e - waste – definition - sources of e-waste– hazardous substances in e-waste - effects of e-waste on environment and human health- need for e-waste management– e-waste handling rules - waste minimization techniques for managing e-waste – recycling of e-waste - disposal treatment methods of e- waste – mechanism of extraction of precious metal from leaching solution-global Scenario of E-waste – E-waste in India- case studies.
Batteries -Introduction – Types of Batteries – discharging and charging of battery - characteristics of battery –battery rating- various tests on battery- – Primary battery: silver button cell- Secondary battery :Ni-Cd battery-modern battery: lithium ion battery-maintenance of batteries-choices of batteries for electric vehicle applications.
Fuel Cells: Introduction- importance and classification of fuel cells - description, principle, components, applications of fuel cells: H2-O2 fuel cell, alkaline fuel cell, molten carbonate fuel cell and direct methanol fuel cells.
Null Bangalore | Pentesters Approach to AWS IAMDivyanshu
#Abstract:
- Learn more about the real-world methods for auditing AWS IAM (Identity and Access Management) as a pentester. So let us proceed with a brief discussion of IAM as well as some typical misconfigurations and their potential exploits in order to reinforce the understanding of IAM security best practices.
- Gain actionable insights into AWS IAM policies and roles, using hands on approach.
#Prerequisites:
- Basic understanding of AWS services and architecture
- Familiarity with cloud security concepts
- Experience using the AWS Management Console or AWS CLI.
- For hands on lab create account on [killercoda.com](https://killercoda.com/cloudsecurity-scenario/)
# Scenario Covered:
- Basics of IAM in AWS
- Implementing IAM Policies with Least Privilege to Manage S3 Bucket
- Objective: Create an S3 bucket with least privilege IAM policy and validate access.
- Steps:
- Create S3 bucket.
- Attach least privilege policy to IAM user.
- Validate access.
- Exploiting IAM PassRole Misconfiguration
-Allows a user to pass a specific IAM role to an AWS service (ec2), typically used for service access delegation. Then exploit PassRole Misconfiguration granting unauthorized access to sensitive resources.
- Objective: Demonstrate how a PassRole misconfiguration can grant unauthorized access.
- Steps:
- Allow user to pass IAM role to EC2.
- Exploit misconfiguration for unauthorized access.
- Access sensitive resources.
- Exploiting IAM AssumeRole Misconfiguration with Overly Permissive Role
- An overly permissive IAM role configuration can lead to privilege escalation by creating a role with administrative privileges and allow a user to assume this role.
- Objective: Show how overly permissive IAM roles can lead to privilege escalation.
- Steps:
- Create role with administrative privileges.
- Allow user to assume the role.
- Perform administrative actions.
- Differentiation between PassRole vs AssumeRole
Try at [killercoda.com](https://killercoda.com/cloudsecurity-scenario/)
Advanced control scheme of doubly fed induction generator for wind turbine us...IJECEIAES
This paper describes a speed control device for generating electrical energy on an electricity network based on the doubly fed induction generator (DFIG) used for wind power conversion systems. At first, a double-fed induction generator model was constructed. A control law is formulated to govern the flow of energy between the stator of a DFIG and the energy network using three types of controllers: proportional integral (PI), sliding mode controller (SMC) and second order sliding mode controller (SOSMC). Their different results in terms of power reference tracking, reaction to unexpected speed fluctuations, sensitivity to perturbations, and resilience against machine parameter alterations are compared. MATLAB/Simulink was used to conduct the simulations for the preceding study. Multiple simulations have shown very satisfying results, and the investigations demonstrate the efficacy and power-enhancing capabilities of the suggested control system.
Advanced control scheme of doubly fed induction generator for wind turbine us...
High Performance Computer Architecture
1. 1
Pipeline Hazards and
Their Resolution
Mechanisms
Mr. SUBHASIS DASH
School of Computer Engineering
KIIT UNIVERSITY
BHUBANESWAR
High Performance
Computer Architecture
2. 5
Drags on Pipeline Performance
Factors that can degrade pipeline
performance:
− Unbalanced stages
− Pipeline overheads
− Clock skew
− Hazards
Hazards cause the worst drag on
the performance of a pipeline.
3. 6
Pipeline Hazards
What is a pipeline hazard?
−A situation that prevents an instruction
from executing during its designated
clock cycles.
There are 3 classes of hazards:
−Structural Hazards
−Data Hazards
−Control Hazards
4. 7
Structural Hazards
Arise from resource conflicts among
instructions executing concurrently:
− Same resource is required by two (or more)
concurrently executing instructions at the same
time.
Easy way to avoid structural hazards:
− Duplicate resources (sometimes not practical)
− Memory interleaving ( lower & higher order )
Examples of Resolution of Structural Hazard:
− An ALU to perform an arithmetic operation and an
adder to increment PC.
− Separate data cache and instruction cache
6. 9
An Example of a Structural
Hazard
ALU
RegMem DM Reg
ALU
RegMem DM Reg
ALU
RegMem DM Reg
ALU
RegMem DM Reg
Time
ALU
RegMem DM Reg
Load
Instruction 1
Instruction 2
Instruction 3
Instruction 4
Would there be a hazard here?
7. 10
How is it Resolved?
ALU
RegMem DM Reg
ALU
RegMem DM Reg
ALU
RegMem DM Reg
Time
ALU
RegMem DM Reg
Load
Instruction 1
Instruction 2
Stall
Instruction 3
Bubble Bubble Bubble Bubble Bubble
A Pipeline can be stalled by
inserting a “bubble” or NOP
8. 11
Performance with Stalls
Stalls degrade performance of a
pipeline:
−Result in deviation from 1
instruction executing/clock cycle.
−Let’s examine by how much stalls can
impact CPI…
9. 12
Stalls and Performance
• CPI pipelined =
=Ideal CPI + Pipeline stall cycles per instruction
=1 + Pipeline stall cycles per instruction
• Ignoring overhead and assuming stages are
balanced:
ninstructiopercyclesstallpipeline
dunpipelineCPI
Speedup
+
=
1
10. 13
Alternate Speedup Expression
depthPipeline
dunpipelinecycleClock
pipelinedcycleClock =
pipelinedcycleClock
dunpipelinecycleClock
depthPipeline =
pipelinedcycleClock
dunpipelinecycleClock
ninstructiopercyclesstallPipeline
pipeliningfromSpeedup ×
+
=
1
1
If pipe stages are perfectly balanced & there is no
overhead, the clock cycle on the pipelined processor is
smaller than the clock cycle of the un-pipelined processor by
a factor equal to the pipelined depth.
11. 14
An Example of Performance
Impact of Structural Hazard
Assume:
−Pipelined processor.
− Data references constitute 40% of an
instruction mix.
− Ideal CPI of the pipelined machine is 1.
− Consider two cases:
Unified data and instruction cache vs. separate
data and instruction cache.
What is the impact on performance?
12. 15
An Example
cont…
Avg. Inst. Time = CPI x Clock Cycle Time
(i) For Separate cache: Avg. Instr. Time=1*1=1
(ii) For Unified cache case:
= (1 + 0.4 x 1) x (Clock cycle timeideal)
= 1.4 x Clock cycle timeideal=1.4
Speedup= 1/1.4
= 0.7
30% degradation in performance
13. 16
Data Hazards
Occur when an instruction under execution
depends on:
− Data from an instruction ahead in pipeline.
Example:
− Dependent instruction uses old data:
Results in wrong computations
IF ID EXE MEM WB
IF ID EXE MEM WB
A=B+C; D=A+E;
A=B+C;
D=A+E;
14. 17
Types of Data Hazards
Data hazards are of three types:
− Read After Write (RAW)
− Write After Read (WAR)
− Write After Write (WAW)
With an in-order execution machine:
− WAW, WAR hazards can not occur.
Assume instruction i is issued before
j.
15. 18
Read after Write (RAW) Hazards
Hazard between two instructions I & J
may occur when j attempts to read some
data object that has been modified by I.
− instruction j tries to read its operand before
instruction i writes it.
− j would incorrectly receive an old or incorrect
value.
Example:
… j i …
Instruction j is a
read instruction
issued after i
Instruction i is a
write instruction
issued before j
i: ADD R1, R2, R3
j: SUB R4, R1, R6
16. 19
Read after Write (RAW)
Hazards
D (I)
Instn
. I
Write
R (I)
D (J)
R (J)
Instn
. J
Read
RAW
R (I) D (J) ≠∩ Ø for RAW
17. 20
RAW Dependency: More
Examples
Example program (a):
−i1: load r1, addr;
−i2: add r2, r1,r1;
Program (b):
−i1: mul r1, r4, r5;
−i2: add r2, r1, r1;
Both cases, i2 does not get operand until i1
has completed writing the result
−In (a) this is due to load-use dependency
−In (b) this is due to define-use dependency
18. 21
Write after Read (WAR) Hazards
Hazard may occur when j attempts to
modify (write) some data object that is
going to read by I.
− Instruction J tries to write its operand at
destination before instruction I read it.
− I would incorrectly receive a new or incorrect
value.
Example:
… j i …
Instruction j is a
write instruction
issued after i
Instruction i is a
read instruction
issued before j
i: ADD R1, R2, R3
j: SUB R2, R4, R6
19. 22
Write after Read (WAR)
Hazards
D (J)
Instn
. J
Write
R (J)
D (I) R (I)
Instn
. I
Read
WAR
D (I) R (J) ≠∩ Ø for WAR
20. 23
Write After Write (WAW) Hazards
WAW hazard:
− Both I & J wants to modify a same data object.
− instruction j tries to write an operand before
instruction i writes it.
− Writes are performed in wrong order.
Example:
… j i …
Instruction j is a
write instruction
issued after i
Instruction i is a
write instruction
issued before j
i: DIV F1, F2, F3
j: SUB F1, F4, F6
(How can this happen???)
21. 24
Write After Write (WAW)
Hazards
D (I)
Instn
. I
Write
R (I)
R (J) D (J)
Instn
. J
Write
WAW
R (I) R (J) ≠∩ Ø for WAW
22. 25
WAR and WAW Dependency:
More Examples
Example program (a):
−i1: mul r1, r2, r3;
−i2: add r2, r4, r5;
Example program (b):
−i1: mul r1, r2, r3;
−i2: add r1, r4, r5;
Both cases have dependence between i1 and
i2
−in (a) r2 must be read before it is written into
−in (b) r1 must be written by i2 after it has been
written into by i1
23. 26
Inter-Instruction Dependences
Data dependence
r3 ← r1 op r2 Read-after-Write
r5 ← r3 op r4 (RAW)
Anti-dependence
r3 ← r1 op r2 Write-after-Read
r1 ← r4 op r5 (WAR)
Output dependence
r3 ← r1 op r2 Write-after-Write
r5 ← r3 op r4 (WAW)
r3 ← r6 op r7
Control dependence
False
Dependency
24. 27
Data Dependencies : Summary
Data dependencies
in straight-line code
RAW
Read After Write
dependency
Load-Use
dependency
Define-Use
dependency
WAR
Write After Read
dependency
WAW
Write After Write
dependency
( Flow dependency ) ( Anti dependency ) ( Output dependency )
True dependency
Cannot be overcome
False dependency
Can be eliminated by register renaming
25. 28
Solutions to Data Hazard
Operand forwarding
By S/W (NOP)
Reordering the instruction
26. 29
Recollect Data Hazards
What causes them?
− Pipelining changes the order of read/write
accesses to operands.
− Order differs from that of an unpipelined
machine.
Example:
− ADD R1, R2, R3
− SUB R4, R1, R5
For MIPS, ADD writes the
register in WB but SUB
needs it in ID.
This is a data hazard
27. 30
Illustration of a Data Hazard
ALU
RegMem DM Reg
ALU
RegMem DM Reg
ALU
RegMem DM
RegMem
Time
ADD R1, R2, R3
SUB R4, R1, R5
AND R6, R1, R7
OR R8, R1, R9
XOR R10, R1, R11
ALU
RegMem
ADD instruction causes a hazard in next 3 instructions
because register not written until after those 3 read it.
28. 31
Illustration of a Data Hazard
ADD R1, R2, R3
SUB R4, R1, R5
AND R6, R1, R7
OR R8, R1, R9
XOR R10, R1, R11
1 + 1 + 1= 3 STALLS
Time
Inst.n
1 2 3 4 5 6 7 8 9 10 11 12
IF ID EXE MEM WB
IF ID STALL STALL ID EXE MEM WB
IF STALL STALL STALL ID EXE MEM WB
IF ID EXE MEM WB
IF ID EXE MEM WB
Total time to execute = 12 clock cycle.
29. 32
Forwarding
Simplest solution to data hazard:
− forwarding
Result of the ADD instruction not really
needed:
− until after ADD actually produces it.
Can we move the result from EX/MEM
register to the beginning of ALU (where
SUB needs it)?
− Yes!
32. 35
When Can We Forward?
ALU
RegMem DM Reg
ALU
RegMem DM Reg
ALU
RegMem DM
RegMem
Time
ADD R1, R2, R3
SUB R4, R1, R5
AND R6, R1, R7
OR R8, R1, R9
XOR R10, R1, R11
ALU
RegMem
SUB gets info.
from EX/MEM
pipe register
AND gets info.
from MEM/WB
pipe register
OR gets info. by
forwarding from
register file
If line goes “forward” you can do forwarding.
If its drawn backward, it’s physically impossible.
33. 36
Illustration of a Data Hazard Using
Operand Forwarding
ADD R1, R2, R3
SUB R4, R1, R5
AND R6, R1, R7
OR R8, R1, R9
XOR R10, R1, R11
Time
Inst.n
1 2 3 4 5 6 7 8 9 10 11 12
IF ID EXE MEM WB
IF ID EXE MEM WB
IF ID EXE MEM WB
IF ID EXE MEM WB
IF ID EXE MEM WB
So, here in this illustration due to operand forwarding pipeline
execution takes 3 number of less clock cycles in comparison to
the pipeline execution without operand forwarding.
Total time to execute = 09 clock cycle.
34. 37
Illustration of a Data Hazard
Using Operand Forwarding
A five stage pipeline processor has IF, ID, EXE, MEM,
WB. The IF, ID, MEM, WB stages takes 1 clock cycles
each for any instruction. The EXE stage takes 1 clock
cycle for ADD & SUB instructions, 3 clock cycles for
MUL instructions and 6 clock cycles for DIV
instructions respectively. Operand forwarding is used
in the pipeline. Then calculate the number of clock
cycles needed to execute the following sequence of
instruction.
MUL R2, R0, R1
DIV R5, R3, R4
ADD R2, R5, R2
SUB R5, R2, R6
35. 38
Illustration of a Data Hazard Using
Operand Forwarding
MUL R2, R0, R1
DIV R5, R3, R4
ADD R2, R5, R2
SUB R5, R2, R6
Time
Inst.n 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
IF ID EXE EXE EXE MEM WB
IF ID STALL STALL
EXE EXE EXE EXE EXE EXE MEM WB
IF STALL STALL
ID STALL STALL STALL STALL STALL
EXE MEM WB
IF STALL STALL STALL STALL STALL
ID EXE MEM WB
1 + 1 + 1 + 1 + 1 + 1 + 1 = 7 STALL
Total time to execute = 15 clock cycle.
36. 39
Handling data hazard by S/W
Compiler introduce NOP in between two
instructions
NOP = a piece of code which keeps a gap
between two instruction
Detection of the dependency is left
entirely on the S/W
Advantage :- We find the easy technique
called as instruction reordering.
38. 41
Data Hazard Requiring Stall
LOAD R1, 10(R2)
DSUB R4, R1,R5
AND R6, R1, R7
OR R8, R1,R9
IF ID EXE MEM WB
IF ID EXE MEM WB
IF ID EXE MEM WB
IF ID EXE MEM WB
IF ID EXE MEM WB
IF ID EXE MEM WB
IF ID EXE MEM WB
IF ID EXE MEM WB
STALL
LOAD R1, 10(R2)
DSUB R4, R1,R5
AND R6, R1, R7
OR R8,
R1,R9
LOAD instruction has a latency that can not be eliminated by operand
forwarding . This is called as pipeline inter-lock to preserve the correct
execution.
Pipeline Inter-lock
39. 42
Control Hazards
Result from branch and other instructions that
change the flow of a program (i.e. change PC).
Example:
Statement in line 2 is control dependent on
statement at line 1.
Until condition evaluation completes:
− It is not known whether s1 or s2 will execute next.
1: If(cond){
2: s1}
3: s2
40. 43
Can You Identify Control
Dependencies?
1: if(cond1){
2: s1;
3: if(cond2){
4: s2;}
5: }
If a branch instruction
changes the PC to its target
address then it is a taken
branch. Otherwise, if it falls
through then it is untaken.
41. 44
Control Hazard
Result of branch instruction not known until
end of MEM/EXE stage (Why)
Solution: Stall until result of branch
instruction is known
That an instruction is a branch is known at the
end of its ID cycle
Note: “IF” may have to be repeated
42. 45
Reducing Branch Penalty
Two approaches:
1) Move condition comparator to ID
stage:
Decide branch outcome and target
address in the ID stage itself:
1) Branch prediction
43. 46
An Example of Impact of
Branch Penalty
Assume for a MIPS pipeline:
−16% of all instructions are
branches:
4% unconditional branches: 3 cycle penalty
12% conditional: 50% taken: 3 cycle penalty
Calculate Overall CPI ???
44. 47
Impact of Branch Penalty
For a sequence of N instructions:
− N cycles to initiate each
− 3 * 0.04 * N = 0.12N delays due to unconditional
branches
− 3 * 0.5 * 0.12 * N = 0.18N delays due to
conditional taken
Overall CPI= (1 + sum of delays due to unconditional
& conditional branches) * N
= 1 + (0.12+ 0.18)*N = 1.3*N
− (or 1.3 cycles/instruction)
45. 48
Branch Hazard Solutions
#1: Stall
- until branch direction is clear – flushing pipe
Three simplest methods of dealing
with branches:
− Flush Pipeline:
Redo the instructions following a branch,
once an instruction is detected to be
branch taken during the ID stage.
Very simple for design
Branch penalty is fixed & can not be
reduced by S/W.
46. 49
Branch Hazard Solutions
#2: Predict Branch Not Taken
−Execute successor instructions in sequence as if
there is no branch
−undo instructions in pipeline if branch actually
taken
−NOTE: If branch is taken, we need to turn the
fetch of instruction into NOP & restart the fetch at
target address.
47. 50
Predict Untaken Scheme
When branch is untaken, determine during ID phase, we have
fetched the fall through and just continue.
48. 51
Branch Hazard Solutions cont…
#3: Predict Branch Taken
• Treat every branch as taken
• Decode (ID) branch instn
. then IF from branch target
− But branch target address not available after IF in
MIPS
MIPS still incurs 1 cycle branch penalty even with predict
taken
Other machines: branch target known before branch
outcome computed, significant benefits can accrue
49. 52
Predict Taken Scheme
When branch is taken during ID phase, we have to restart the
fetched at branch target & all instructions following the branch to be
STALL by 1 clock cycle.
STALL
50. 53
Branch Hazard Solutions Cont…
#4: Delayed Branch
Instruction(s) after branch are executed anyway
Sequential successors are called branch-delay-slots
−Insert unrelated successor in the branch delay slot
branch instruction
sequential successor1
sequential successor2
........
sequential successorn
branch target if taken
−1 slot delay required in 5 stage pipeline
Branch delay of length n
51. 54
Delayed Branch
Simple idea: Put an instruction that would be executed
anyway right after a branch.
Question: What instruction do we put in the delay slot?
Answer: one that can safely be executed no matter
what the branch does.
− The compiler decides this.
IF ID EX MEM WB
IF
IF ID EX MEM WB
ID EX MEM WB
Branch
Delayed slot instruction
Branch target OR successor
delay slot
52. 55
Delayed Branch
One possibility: Fill the slot with an instruction from
before the branch instruction
Example:
Restriction: branch must not depend on result of the filled
instruction
The DADD instruction is executed no matter what happens in
the branch:
− Because it is executed before the branch!
− Therefore, it can be moved
− Don’t have dependency with the branch instruction.
DADD R1, R2, R3
if R2 == 0 then
delay slot
DADD R1, R2, R3
if R2 == 0 then
DADD R1, R2, R3
53. 56
Delayed Branch
IF ID EX MEM WB
IF ID EX MEM WB
IF ID EX MEM WB
branch
add instruction
branch target/successor
By this time, we know whether
to take the branch or whether not
to take it
We get to execute the “DADD” execution
“for free”
54. 57
Delayed Branch
Another possibility: Fill the slot with an independent
instruction from much before or an independent
instruction from the target of the branch instruction
Example:
Restriction: Should be OK to execute instruction
even if not taken
The DSUB instruction can be replicated into the delay slot
DSUB R4, R5, R6
...
DADD R1, R2, R3
if R1 == 0 then
delay slot
55. 58
Delayed Branch
Example:
The DSUB instruction can be replicated into the
delay slot
Improves performance: when branch is taken
DSUB R4, R5, R6
...
DADD R1, R2, R3
if R1 == 0 then
DSUB R4, R5, R6
56. 59
Delayed Branch
Yet another possibility: Fill the slot with an
independent instruction from fall through (any 1
independent instruction from branch successor) of
branch
Example:
Restriction: Should be OK to execute instruction
even if taken
The OR instruction can be moved into the delay slot
ONLY IF its execution doesn’t disrupt the program
execution
DADD R1, R2, R3
if R1 == 0 then
OR R7, R8, R9
DSUB R4, R5, R6
delay slot
57. 60
Delayed Branch
Example:
The OR instruction can be moved into the delay
slot ONLY IF its execution doesn’t disrupt the
program execution
Improves performance: when branch is not
taken
DADD R1, R2, R3
if R1 == 0 then
OR R7, R8, R9
DSUB R4, R5, R6
OR R7, R8, R9
58. 61
LD
DSUBU
BEQZ
R1,0(R2)
R1,R1,R3
R1,L
OR
DADDU
R4,R5,R6
R10,R4,R3
DADDU R7,R8,R9
R1 == 0
R1 != 0LD
DSUBU
BEQZ
OR
DADDU
DADDUL :
R1,0(R2)
R1,R1,R3
R1,L
R4,R5,R6
R10,R4,R3
R7,R8,R9
1.) BEQZ is dependent on DSUBU and DSUBU on LD,
2.) If we knew that the branch was taken with a high probability, then
DADDU could be moved into block B1, since it doesn’t have any dependencies
with block B2,
3.) Conversely, knowing the branch was not taken, then OR could be moved
into block B1, since it doesn’t affect anything in B3,
B1
B2
B3
Delayed Branch Example
Taken
Un-taken
59. 62
Delayed Branch
Where to get instructions to fill
branch delay slots?
−Before branch instruction
−From the target address: Useful only if
branch taken.
−From fall through: Useful only if branch
not taken.
60. 63
Performance of branch with Stalls
Stalls degrade performance of a
pipeline:
−Result in deviation from 1
instruction executing/clock cycle.
−Let’s examine by how much stalls can
impact CPI…
61. 64
Stalls and Performance with
branch
• CPI pipelined =
=Ideal CPI + Pipeline stall cycles per instruction
=1 + Pipeline stall cycles per instruction
62. 65
Performance of branch instruction
Pipeline speed up
Pipeline stall cycle from branches = Branch
frequency * branch penalty
Pipeline speed up =
Pipeline depth
1+ pipeline stall cycle from branch
Pipeline depth
1+ Branch frequency * Branch Penalty
63. 66
Program Dependences Can
Cause Hazards!
Hazards can be caused by dependences
within a program.
There are three main types of
dependences in a program:
− Data dependence
− Name dependence(False Dependancy)
− Control dependence
64. 67
Types of Name Dependences
Two types of name dependences:
− Anti-dependence
− Output dependence
65. 68
Anti-Dependence or (WAR)
Anti-dependence occurs between two
instructions i and j, iff:
− j writes to a register or memory location that i
reads.
− Original ordering must be preserved to ensure
that i reads the correct value.
Example:
− ADD F0,F6,F8
− SUB F8,F4,F5
66. 69
Output Dependence or (WAW)
Output dependence occurs between
two instructions i and j, iff:
− The two instructions write to the same
memory location.
Ordering of the instructions must be
preserved to ensure:
− Finally written value corresponds to j.
Example:-ADD f6,f0,f8
Mul f6,f10,f8
67. 70
Exercise
Identify all the dependences in
the following C code:
1. a=b+c;
2. b=c+d;
3. a=a+c;
4. c=b+a;
68. 71
A Solution to WAR and WAW
Hazards
Rename Registers
−i1: mul r1, r2, r3;
−i2: add r1, r4, r5;
Register renaming can get rid of most false
dependencies:
−Compiler can do register renaming in the register
allocation process (i.e., the process that assigns
registers to variables).
69. 72
Out-of-order Pipelining
• • •
• • •
• • •
• • •IF
ID
RD
WB
INT Fadd1 Fmult1 LD/ST
Fadd2 Fmult2
Fmult3
EX
Program Order
Ia: F1 ← F2 x F3
. . . . .
Ib: F1 ← F4 + F5
Out-of-order WB
Ib: F1 ← “F4 + F5”
. . . . . .
Ia: F1 ← “F2 x F3”
70. 73
Branch Prediction
Mr. SUBHASIS DASH
School of Computer Engineering
KIIT UNIVERSITY
BHUBANESWAR
High Performance
Computer Architecture
71. 74
Dynamic Branch Prediction
• KEY IDEA: Hope that branch assumption
is correct.
• If yes, then we’ve gained a performance
improvement.
• Otherwise, discard instructions
• program is still correct, all we’ve done is “waste” a
clock cycle.
Two approaches
Direction Based Prediction History /Profile Based Prediction
72. 75
Direction Based Prediction
• Simple to implement
• However, often branch behavior is
variable (dynamic).
• Can’t capture such behavior at compile
time with simple direction based
prediction!
• Need history ( profile)-based
prediction.
73. 76
History-based Branch
Prediction
An important example is State-based
branch prediction:
Needs 2 parts:
− “Predictor” to guess where/if
instruction will branch (and to where)
− “Recovery Mechanism”: i.e. a way to fix
mistakes
74. 77
History-based Branch
Prediction cont…
One bit predictor:
− Use result from last time this
instruction executed.
Problem:
− Even if branch is almost always taken,
we will be wrong at least twice
− if branch alternates between taken, not
taken
We get 0% accuracy
75. 78
1-Bit Prediction Example
All Most All are taken
Let initial value = NT, actual outcome
of branches is- T, T, T, T, T, NT
− Predictions are:
NT, T,T,T,T,T
2 wrong (in red), 4 correct = 66% accuracy
2-bit predictors can do even better
In general, can have k-bit predictors.
76. 79
1-Bit Prediction Example
All Most All are untaken
Let initial value = T, actual outcome of
branches is- NT, NT, NT, NT, NT, T
− Predictions are:
T, NT, NT, NT, NT, NT
2 wrong (in red), 4 correct = 66% accuracy
2-bit predictors can do even better
In general, can have k-bit predictors.
77. 80
1-Bit Prediction Example
A Mix Type
Let initial value = T, actual outcome
of branches is- NT, NT,NT,T,T,T
− Predictions are:
T, NT,NT,NT,T,T
2 wrong (in red), 4 correct = 66% accuracy
2-bit predictors can do even better
In general, can have k-bit predictors.
78. 81
1-Bit Prediction Example
Alternate Taken Untaken
Let initial value = T, actual outcome
of branches is- NT, T, NT, T
− Predictions are:
T, NT, T, NT
4 wrong (in red), 0 correct = 0% accuracy
2-bit predictors can do even better
In general, can have k-bit predictors.
79. 82
1-bit Predictor
Set bit to 1 or 0:
−Depending on whether branch Taken
(T) or Not-taken (NT)
−Pipeline checks bit value and predicts
−If incorrect then need to discard
speculatively executed instruction
Actual outcome used to set the
bit value.
80. 83
• Change prediction only if twice mispredicted:
• Adds hysteresis to decision making process
2-bit Dynamic Branch
Prediction Scheme
T
T
NT
Predict Taken
Predict Not
Taken
Predict Taken
Predict Not
Taken
11 10
01 00
T
NT
T
NT
NT
81. 84
2-Bit Prediction Example
Let initial value = T, actual outcome
of branches is- NT, NT,NT,T,T,T
− Predictions are:
T, T,NT,NT,NT,T
4 wrong (in red), 2 correct = 44% accuracy
2-bit predictors can do even better
In general, can have k-bit predictors.
82. 85
2-Bit Prediction Example
Let initial value = NT, actual outcome
of branches is- NT, NT,NT,T,T,T
− Predictions are:
NT, NT,NT,NT,NT,T
2 wrong (in red), 4correct = 66% accuracy
2-bit predictors can do even better
In general, can have k-bit predictors.
83. 86
2-Bit Prediction Example
Let initial value = T, actual outcome
of branches is- NT, T, NT, T
− Predictions are:
T, T, T, T
2 wrong (in red), 2 correct = 50% accuracy
2-bit predictors can do even better
In general, can have k-bit predictors.
84. 87
2-Bit Prediction Example
Let initial value = NT, actual outcome
of branches is- NT, T, NT, T
− Predictions are:
NT, NT, NT, NT
2 wrong (in red), 2 correct = 50% accuracy
2-bit predictors can do even better
In general, can have k-bit predictors.
85. 88
An Example of Computing
Performance
Program assumptions:
− 23% loads and in ½ of cases, next
instruction uses load value
− 13% stores
− 19% conditional branches
− 2% unconditional branches
− 43% other
86. 89
Example
cont…
Machine Assumptions:
−5 stage pipe
Penalty of 1 cycle on use of load value
immediately after a load.
Jumps are resolved in ID stage for a
1 cycle branch penalty.
75% branch prediction accuracy.
1 cycle delay on misprediction.
Calculate CPI???
87. 90
Example
cont…CPI penalty calculation:
− Loads:
50% of 23% of loads have 1 cycle penalty:0.5*0.23=0.115
− Jumps:
All of the 2% of jumps have 1 cycle penalty: 0.02*1 = 0.02
− Conditional Branches:
25% of the 19% are mispredicted, have a 1 cycle penalty:
0.25*0.19*1 = 0.0475
Total Penalty: 0.115 + 0.02 + 0.0475 = 0.1825
Average CPI: 1 + 0.1825 = 1.1825
88. 91
Loop-level Parallelism
It may be possible to execute
different iterations of a loop in
parallel.
Example:
− For(i=0;i<1000;i++){
− a[i]=a[i]+b[i];
− b[i]=b[i]*2;
− }
89. 92
Problems in Exploiting Loop-
level Parallelism
• Loop Carried Dependences:
• A dependence across different
iterations of a loop.
Loop Independent Dependences:
− A dependence within the body of the
loop itself (i.e. within one iteration).
92. 95
Static Loop Unrolling
- A high proportion of loop instructions are
loop management instructions.
- Eliminating this overhead can significantly
increase the performance of the loop.
- for(i=1000;i>0;i--)
- {
- a[i]=a[i]+c;
- }
95. 98
Static Loop Unrolling
cont…
Loop : F0,0(R1)
F4,F0,F2
F4,0(R1)
F6,-8(R1)
F8,F6,F2
F8,-8(R1)
F10,-16(R1)
F12,F10,F2
F12,-16(R1)
F14,-24(R1)
F16,F14,F2
F16,-24(R1)
R1,R1,#-32
R1,R2,Loop
n loop
Bodies for
n = 4
Adjusted loop
overhead
instructions
Note the adjustments
for store and load
offsets (only store
highlighted red)!
Note the renamed
registers. This eliminates
dependencies between
each of n loop bodies of
different iterations.
L.D
ADD.D
S.D
L.D
ADD.D
S.D
L.D
ADD.D
S.D
L.D
ADD.D
S.D
DADDUI
BNE
97. 100
Software Pipelining
cont…
Central idea: reorganize loops
− Each iteration is made from instructions
chosen from different iterations of the
original loop.
i4
i3
i2
i1
i0
Software
Pipeline
Iteration
i5
98. 101
Software Pipelining
cont…
Observation: if iterations from loops
are independent, then can get more
ILP by taking instructions from
different iterations
Exactly just as it happens in a
hardware pipeline:
−In each iteration of a software pipelined
code, some instruction of some iteration
of the original loop is executed.
99. 102
Static Loop Unrolling Example
L.D
ADD.D
S.D
DADDUI
BNE
Loop : F0,0(R1)
F4,F0,F2
F4,0(R1)
R1,R1,#-8
R1,R2,Loop
; F0 = array elem.
; add scalar in F2
; store result
; decrement ptr
; branch if R1 !=R2
100. 103
Software Pipelining
cont…
- How is this done?
1 unroll loop body with an unroll
factor of n. (we have taken n = 3 for our
example)
2 select order of instructions from
different iterations to pipeline
3 “paste” instructions from different
iterations into the new pipelined loop body
101. 104
Software Pipelining: Step 1
Iteration i:
Iteration i + 1:
Iteration i + 2:
Note:
1.) We are unrolling the loop
Hence no loop overhead
Instructions are needed!
2.) A single loop body of
restructured loop would
contain instructions from
different iterations of the
original loop body.
Unrolled 3 times
1 L.D F0,0(R1)
2 ADD F4,F0,F2
3 S.D F4,0(R1)
4 L.D F6,-8(R1)
5 ADD F8,F6,F2
6 S.D F8,-8(R1)
7 L.D F10,-16(R1)
8 ADD F12,F10,F2
9 S.D F12,-16(R1)
10 DSUBUI R1,R1,#24
11 BNEZ R1,LOOP
102. 105
Software Pipelining: Step 2
Iteration i:
Iteration i + 1:
Iteration i + 2:
before:-Unrolled 3 times
1 L.D F0,0(R1)
2 ADD F4,F0,F2
3 S.D F4,0(R1)
4 L.D F6,-8(R1)
5 ADD F8,F6,F2
6 S.D F8,-8(R1)
7 L.D F10,-16(R1)
8 ADD F12,F10,F2
9 S.D F12,-16(R1)
10 DSUBUI R1,R1,#24
11BNEZ R1,LOOP
1.)
2.)
3.)
Notes:
1.) We’ll select the following
order in our pipelined loop:
2.) Each instruction (L.D
ADD.D S.D) must be
selected at least once to
make sure that we don’t
leave out any instructions of
the original loop in the
pipelined loop.
104. 107
Software Pipelining: Step 3
L.D
ADD.D
S.D
Iteration i: F0,0(R1)
F4,F0,F2
F4,0(R1)
L.D
ADD.D
S.D
L.D
ADD.D
S.D
F6,-8(R1)
F8,F6,F2
F8,-8(R1)
F10,-16(R1)
F12,F10,F2
F12,-16(R1)
Iteration i + 1:
Iteration i + 2:
1.)
2.)
3.)
S.D
ADD.D
L.D
DADDU
BNE
Loop : F4,0(R1)
F4,F0,F2
F0,-16(R1)
R1,R1,#-8
R1,R2,Loop
The Pipelined Loop
• Symbolic Loop Unrolling
– Maximize result-use distance
– Less code space than unrolling
– Fill & drain pipe only once per loop
vs. once per each unrolled iteration in loop unrolling
105. 108
Software Pipelining: Step 4
S.D
ADD.D
L.D
DADDUI
BNE
Loop : F4,16(R1)
F4,F0,F2
F0,0(R1)
R1,R1,#-8
R1,R2,Loop
; M[ i ]
; M[ i – 1 ]
; M[ i – 2 ]
Instructions to fill “software pipeline”
Pipelined Loop Body
Preheader
Postheader
Instructions to drain “software pipeline”
106. 109
Software Pipelining Issues
Register management can be tricky.
− In more complex examples, we may need to
increase the iterations between when data
is read and when the results are used.
Optimal software pipelining has been
shown to be an NP-complete problem:
− Present solutions are based on heuristics.
107. 110
Software Pipelining versus Loop
Unrolling
Software pipelining takes less code
space.
Software pipelining and loop unrolling
reduce different types of inefficiencies:
− Loop unrolling reduces loop management
overheads.
− Software pipelining allows a pipeline to run
at full efficiency by eliminating loop-
independent dependencies.
108. 111
Advantages of
Dynamic Scheduling
Can handle dependences unknown at
compile time:
−E.g. dependences involving memory references.
Simplifies the compiler.
Allows code compiled for one pipeline to
run efficiently on a different pipeline.
Hardware speculation can be used:
−Can lead to further performance advantages,
builds on dynamic scheduling.
109. 112
Overview of Dynamic
Instruction Scheduling
We shall discuss two schemes for
implementing dynamic scheduling:
−Scoreboarding: First used in the 1964 CDC
6600 computer.
−Tomasulo’s Algorithm: Implemented for
the FP unit of the IBM 360/91 in 1966.
Since scoreboarding is a little closer to
in-order execution, we’ll look at it first.
110. 113
A Point to Note About
Dynamic Scheduling
WAR and WAW hazards that did
not exist in an in-order pipeline:
−Can arise in a dynamically scheduled
processor.
112. 115
Scoreboarding
The 5 Stage MIPS Pipeline
Split the ID pipe stage of simple
5-stage pipeline into 2 stages:
−Issue: Decode instructions, check
for structural hazards.
−Read operands: Wait until no data
hazards, then read operands.
114. 117
Scoreboarding Concepts
We had observed that WAR and
WAW hazards can occur in out-of-
order execution:
−Instructions involved in a dependence
are stalled,
−But, instructions having no dependence
are allowed to continue.
−Different units are kept as busy as
possible.
115. 118
Scoreboarding Concepts
Essence of scoreboarding:
−Execute instructions as early as
possible.
−When an instruction is stalled,
Later instructions are issued and
executed if they do not depend on
any active or stalled instruction.
116. 119
A Few More Basic
Scoreboarding Concepts
Every instruction goes through the
scoreboard:
− Scoreboard constructs the data
dependences of the instruction.
− Scoreboard decides when an instruction
can execute.
− Scoreboard also controls when an
instruction can write its results into the
destination register.
117. 120
Scoreboarding
Out-of-order execution requires multiple
instructions to be in the EX stage
simultaneously:
− Achieved with multiple functional units,
along with pipelined functional units.
All instructions go through the
scoreboard:
− Centralized control of issue, operand
reading, execution and writeback.
− All hazard resolution is centralized in the
scoreboard as well.
118. 121
A Scoreboard for MIPS
FP Mult
FP Mult
FP Divide
FP Add
Integer Unit
Scoreboard
R
e
g
i
s
t
e
r
s
Data buses – source of structural hazard
Control/
status
Control/
status
119. 122
4 Steps of Execution with
Scoreboarding
1. Issue: when a functional unit for an
instruction is free and no other active
instruction has the same destination
register:
• Avoids structural and WAW hazards.
2. Read operands: when all source operands
are available:
• Note: forwarding not used.
• A source operand is available if no earlier
issued active instruction is going to write it.
• Thus resolves RAW hazards dynamically.
120. 123
Steps in Execution with
Scoreboarding
1. Execution: begins when the functional
unit. receives its operands; scoreboard
notified when execution completes.
2. Write Result: after WAR hazards have
been resolved.
121. 124
Different parts of Scoreboard
Instruction Status Unit :-
− Indicates which of 4 phases the instruction is in
Functional Status Unit :-
− Indicates the states of functional unit & each
functional unit has 9 different fields.
1. BUSY:- Indicates weather the functional unit is busy or free.
2. OP :- Indicates the type of operation to be perform in the unit
3. Fi :- Indicates the destination register
4. Fj, Fk :- Indicates the source registers
5. Qj, Qk :- Indicates the functional units producing source
registers value
6. Rj, Rk :- Flags indicating when are ready & not yet read
1. IF operand is ready & not yet read then YES
2. IF operand is not ready then NO
3. IF operand is already read then NO
122. 125
Different parts of Scoreboard
Register Result Status Unit :-
− Indicates which functional unit will write each
register, if an active instruction has the register as
its destination.
− The field is set to blank whenever there are no
pending instruction that will write that register.
123. 126
Steps in Execution with Scoreboard Approach
• Write the steps of execution with scoreboard approach.
Assume we have 2 multipliers, 1 adders, 1 dividers & 1 integer
unit. The following set of MIPS instruction is going to be
executed in a pipelined system.
• Example:
• LOAD F6, 34(R2)
• LOAD F2, 45(R3)
• MUL FO, F2, F4
• SUB F8, F6, F2
• DIV F10, F0, F6
• ADD F6, F8, F2
• Consider following execution status for above instruction set:-
• 1st
LOAD instruction is completed its write result phase.
• 2nd
LOAD instruction is completed its execute phase.
• Other remaining instruction are in their instruction issue phase.
• Show the status of instruction, functional unit, & register
result status using dynamic scheduling with scoreboard
approach.
124. 127
Steps in Execution with Scoreboard Approach
Write the steps of execution with scoreboard approach.
Assume we have 2 multipliers, 1 adders, 1 dividers & 1 integer
unit. The following set of MIPS instruction is going to be
executed in a pipelined system.
Example:
• LOAD F6, 34(R2)
• LOAD F2, 45(R3)
• MUL FO, F2, F4
• SUB F8, F6, F2
• DIV F10, F0, F6
• ADD F6, F8, F2
Consider following execution status for above instruction set:-
• 1st
two LOAD & SUB instructions are completed their write result phase.
• MUL & DIV instructions are ready to go for their write result phase.
• ADD instruction is in its execution phase.
Show the status of instruction, functional unit, & register result
status using dynamic scheduling with scoreboard approach.
125. 128
Steps in Execution with Scoreboard Approach
Write the steps of execution with scoreboard approach.
Assume we have 2 multipliers, 1 adders, 1 dividers & 1 integer
unit. The following set of MIPS instruction is going to be
executed in a pipelined system.
Example:
• LOAD F6, 34(R2)
• LOAD F2, 45(R3)
• MUL FO, F2, F4
• SUB F8, F6, F2
• DIV F10, F0, F6
• ADD F6, F8, F2
Consider following execution status for above instruction set:-
• 1st
two LOAD, MUL, SUB & ADD instructions are completed their write
result phase.
• DIV instructions is ready to write its result .
Show the status of instruction, functional unit, & register result
status using dynamic scheduling with scoreboard approach.
127. 130
Scoreboard Example
Instruction status Read ExecutionWrite
Instruction j k Issue operandscompleteResult
LD F6 34+ R2
LD F2 45+ R3
MULTDF0 F2 F4
SUBDF8 F6 F2
DIV F10 F0 F6
ADDDF6 F8 F2
Functional unit status dest S1 S2 FU for jFU for kFj? Fk?
TimeName Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer No
Mult1 No
Mult2 No
Add No
Divide No
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
FU
128. 131
Scoreboard Example Cycle 1
Instruction status Read ExecutionWrite
Instruction j k Issue operandscompleteResult
LD F6 34+ R2 1
LD F2 45+ R3
MUL F0 F2 F4
SUBDF8 F6 F2
DIV F10 F0 F6
ADDDF6 F8 F2
Functional unit status dest S1 S2 FU for jFU for kFj? Fk?
TimeName Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer Yes Load F6 R2 Yes
Mult1 No
Mult2 No
Add No
Divide No
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
1 FU Integer
Issue
LOAD
129. 132
Scoreboard Example Cycle 2
Instruction status Read Execution Write
Instruction j k Issue operandscomplete Result
LD F6 34+ R2 1 2
LD F2 45+ R3
MULT F0 F2 F4
SUB F8 F6 F2
DIVD F10 F0 F6
ADD F6 F8 F2
Functional unit status dest S1 S2 FU for jFU for kFj? Fk?
TimeName Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer Yes Load F6 R2 No
Mult1 No
Mult2 No
Add No
Divide No
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
2 FU Integer
Note: Can’t issue I2
because Integer unit
is busy. Can’t issue
next instruction
MULT due to in-order
issue
130. 133
Scoreboard Example Cycle 3
Instruction status Read Execution Write
Instruction j k Issue operandscomplete Result
LD F6 34+ R2 1 2 3
LD F2 45+ R3
MULT F0 F2 F4
SUB F8 F6 F2
DIV F10 F0 F6
ADD F6 F8 F2
Functional unit status dest S1 S2 FU for jFU for kFj? Fk?
TimeName Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer Yes Load F6 R2 No
Mult1 No
Mult2 No
Add No
Divide No
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
3 FU Integer
Executing LOAD
Instruction
131. 134
Scoreboard Example Cycle 4
Instruction status Read Execution Write
Instruction j k Issue operandscomplete Result
LD F6 34+ R2 1 2 3 4
LD F2 45+ R3
MULT F0 F2 F4
SUB F8 F6 F2
DIV F10 F0 F6
ADD F6 F8 F2
Functional unit status dest S1 S2 FU for jFU for kFj? Fk?
TimeName Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer Yes Load F6 R2 No
Mult1 No
Mult2 No
Add No
Divide No
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
4 FU
Writing Result to F6
( LOAD Instruction )
132. 135
Scoreboard Example Cycle 5
Instruction status Read Execution Write
Instruction j k Issue operandscomplete Result
LD F6 34+ R2 1 2 3 4
LD F2 45+ R3 5
MULT F0 F2 F4
SUB F8 F6 F2
DIV F10 F0 F6
ADD F6 F8 F2
Functional unit status dest S1 S2 FU for jFU for kFj? Fk?
TimeName Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer Yes Load F2 R3 Yes
Mult1 No
Mult2 No
Add No
Divide No
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
5 FU Integer
Issue LD #2 since
integer unit is now
free.
133. 136
Scoreboard Example Cycle 6
Instruction status Read Execution Write
Instruction j k Issue operandscomplete Result
LD F6 34+ R2 1 2 3 4
LD F2 45+ R3 5 6
MULT F0 F2 F4 6
SUB F8 F6 F2
DIV F10 F0 F6
ADD F6 F8 F2
Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?
TimeName Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer Yes Load F2 R3 No
Mult1 Yes Mult F0 F2 F4 Integer No Yes
Mult2 No
Add No
Divide No
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
6 FU Mult Integer
Issue MULT.
Instruction
134. 137
Scoreboard Example Cycle 7
Instruction status Read Execution Write
Instruction j k Issue operandscomplete Result
LD F6 34+ R2 1 2 3 4
LD F2 45+ R3 5 6 7
MULT F0 F2 F4 6
SUB F8 F6 F2 7
DIV F10 F0 F6
ADD F6 F8 F2
Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?
TimeName Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer Yes Load F2 R3 No
Mult1 Yes Mult F0 F2 F4 Integer No Yes
Mult2 No
Add Yes Subd F8 F6 F2 Integer Yes No
Divide No
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
7 FU Mult Integer Add
MULT can’t read its
operands (F2) because
LD #2 hasn’t finished.
135. 138
Scoreboard Example Cycle 8
Instruction status Read EX Write
Instruction j k Issue Op compl. Result
LD F6 34+ R2 1 2 3 4
LD F2 45+ R3 5 6 7 8
MULT F0 F2 F4 6
SUB F8 F6 F2 7
DIV F10 F0 F6 8
ADD F6 F8 F2
Functional unit status dest S1 S2 FU for jFU for kFj? Fk?
TimeName Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer No
Mult1 Yes Mult F0 F2 F4 Yes Yes
Mult2 No
Add Yes Sub F8 F6 F2 Yes Yes
Divide Yes Div F10 F0 F6 Mult1 No Yes
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
8 FU Mult1 Add Divide
DIV issues.
MULT and SUB
both waiting for F2.
LD #2 writes F2.
136. 139
Scoreboard Example Cycle 9
Instruction status Read EX Write
Instruction j k IssueOp completeResult
LD F6 34+ R2 1 2 3 4
LD F2 45+ R3 5 6 7 8
MULT F0 F2 F4 6 9
SUB F8 F6 F2 7 9
DIV F10 F0 F6 8
ADD F6 F8 F2
dest S1 S2 FU for j FU for k Fj? Fk?
Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer No
10 Mult1 Yes Mult F0 F2 F4 No No
Mult2 No
2 Add Yes Sub F8 F6 F2 No No
Divide Yes Div F10 F0 F6 Mult1 No Yes
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
9 FU Mult1 Add Divide
Now MULT and
SUBD can both read
F2 as it is now available. ADD
can’t be issued because SUB uses
the adder
137. 140
Scoreboard Example Cycle 11
Instruction status Read ExecutionWrite
Instruction j k IssueoperandscompleteResult
LD F6 34+ R2 1 2 3 4
LD F2 45+ R3 5 6 7 8
MULTF0 F2 F4 6 9
SUB F8 F6 F2 7 9 11
DIV F10 F0 F6 8
ADD F6 F8 F2
Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?
TimeName BusyOp Fi Fj Fk Qj Qk Rj Rk
Integer No
8 Mult1 Yes Mult F0 F2 F4 No No
Mult2 No
0 Add Yes Sub F8 F6 F2 No No
Divide Yes Div F10 F0 F6 Mult1 No Yes
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
11 FU Mult1 Add Divide
Add takes 2 cycles, so
nothing happens in cycle 10.
MUL & SUB continues.
138. 141
Scoreboard Example Cycle 12
Instruction status Read ExecutionWrite
Instruction j k IssueoperandscompleteResult
LD F6 34+ R2 1 2 3 4
LD F2 45+ R3 5 6 7 8
MULT F0 F2 F4 6 9
SUB F8 F6 F2 7 9 11 12
DIV F10 F0 F6 8
ADD F6 F8 F2
Functional unit status dest S1 S2 FU for jFU for kFj? Fk?
TimeName BusyOp Fi Fj Fk Qj Qk Rj Rk
Integer No
7 Mult1 Yes Mult F0 F2 F4 No No
Mult2 No
Add No
Divide Yes Div F10 F0 F6 Mult1 No Yes
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
12 FU Mult1 Divide
SUB finishes.
DIV waiting for F0.
139. 142
Scoreboard Example Cycle 13
Instruction status Read ExecutionWrite
Instruction j k IssueoperandscompleteResult
LD F6 34+ R2 1 2 3 4
LD F2 45+ R3 5 6 7 8
MULT F0 F2 F4 6 9
SUB F8 F6 F2 7 9 11 12
DIV F10 F0 F6 8
ADD F6 F8 F2 13
Functional unit status dest S1 S2 FU for jFU for kFj? Fk?
TimeName BusyOp Fi Fj Fk Qj Qk Rj Rk
Integer No
6 Mult1 Yes Mult F0 F2 F4 No No
Mult2 No
Add Yes Add F6 F8 F2 Yes Yes
Divide Yes Div F10 F0 F6 Mult1 No Yes
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
13 FU Mult1 Add Divide
Now ADD is issued because
SUB has completed
140. 143
Scoreboard Example Cycle 14
Instruction status Read ExecutionWrite
Instruction j k IssueoperandscompleteResult
LD F6 34+ R2 1 2 3 4
LD F2 45+ R3 5 6 7 8
MULT F0 F2 F4 6 9
SUB F8 F6 F2 7 9 11 12
DIV F10 F0 F6 8
ADD F6 F8 F2 13 14
Functional unit status dest S1 S2 FU for jFU for kFj? Fk?
TimeName BusyOp Fi Fj Fk Qj Qk Rj Rk
Integer No
5 Mult1 Yes Mult F0 F2 F4 No No
Mult2 No
2 Add Yes Add F6 F8 F2 No No
Divide Yes Div F10 F0 F6 Mult1 No Yes
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
14 FU Mult1 Add Divide
141. 144
Scoreboard Example Cycle 15
Instruction status Read ExecutionWrite
Instruction j k IssueoperandscompleteResult
LD F6 34+ R2 1 2 3 4
LD F2 45+ R3 5 6 7 8
MULT F0 F2 F4 6 9
SUB F8 F6 F2 7 9 11 12
DIV F10 F0 F6 8
ADD F6 F8 F2 13 14
Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?
TimeName Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer No
4 Mult1 Yes Mult F0 F2 F4 No No
Mult2 No
1 Add Yes Add F6 F8 F2 No No
Divide Yes Div F10 F0 F6 Mult1 No Yes
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
15 FU Mult1 Add Divide
ADD takes 2 cycles, so no
change
142. 145
Scoreboard Example Cycle 16
Instruction status Read ExecutionWrite
Instruction j k Issue operandscompleteResult
LD F6 34+ R2 1 2 3 4
LD F2 45+ R3 5 6 7 8
MULT F0 F2 F4 6 9
SUB F8 F6 F2 7 9 11 12
DIV F10 F0 F6 8
ADD F6 F8 F2 13 14 16
Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?
TimeName Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer No
3 Mult1 Yes Mult F0 F2 F4 No No
Mult2 No
0 Add Yes Add F6 F8 F2 No No
Divide Yes Div F10 F0 F6 Mult1 No Yes
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
16 FU Mult1 Add Divide
ADD completes
execution, but MULT and
DIV go on
143. 146
Scoreboard Example Cycle 17
Instruction status Read ExecutionWrite
Instruction j k IssueoperandscompleteResult
LD F6 34+ R2 1 2 3 4
LD F2 45+ R3 5 6 7 8
MULT F0 F2 F4 6 9
SUB F8 F6 F2 7 9 11 12
DIV F10 F0 F6 8
ADD F6 F8 F2 13 14 16
Functional unit status dest S1 S2 FU for jFU for kFj? Fk?
TimeName Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer No
2 Mult1 Yes Mult F0 F2 F4 No No
Mult2 No
Add Yes Add F6 F8 F2 No No
Divide Yes Div F10 F0 F6 Mult1 No Yes
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
17 FU Mult1 Add Divide
ADD stalls, can’t write
back due to WAR with
DIV. MULT and DIV
continue
144. 147
Scoreboard Example Cycle 18
Instruction status Read ExecutionWrite
Instruction j k IssueoperandscompleteResult
LD F6 34+ R2 1 2 3 4
LD F2 45+ R3 5 6 7 8
MULT F0 F2 F4 6 9
SUB F8 F6 F2 7 9 11 12
DIV F10 F0 F6 8
ADD F6 F8 F2 13 14 16
Functional unit status dest S1 S2 FU for jFU for kFj? Fk?
TimeName Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer No
1 Mult1 Yes Mult F0 F2 F4 No No
Mult2 No
Add Yes Add F6 F8 F2 No No
Divide Yes Div F10 F0 F6 Mult1 No Yes
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
18 FU Mult1 Add Divide
MULT and
DIV continue
145. 148
Scoreboard Example Cycle 19
Instruction status Read ExecutionWrite
Instruction j k IssueoperandscompleteResult
LD F6 34+ R2 1 2 3 4
LD F2 45+ R3 5 6 7 8
MULTD F0 F2 F4 6 9
SUBD F8 F6 F2 7 9 11 12
DIVD F10 F0 F6 8
ADDD F6 F8 F2 13 14 16
Functional unit status dest S1 S2 FU for jFU for kFj? Fk?
TimeName Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer No
0 Mult1 Yes Mult F0 F2 F4 No No
Mult2 No
Add Yes Add F6 F8 F2 No No
Divide Yes Div F10 F0 F6 Mult1 No Yes
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
19 FU Mult1 Add Divide
19
MULT completes
after 10 cycles
146. 149
Scoreboard Example Cycle 20
Instruction j k IssueoperandscompleteResult
LD F6 34+ R2 1 2 3 4
LD F2 45+ R3 5 6 7 8
MULTD F0 F2 F4 6 9 19 20
SUBD F8 F6 F2 7 9 11 12
DIVD F10 F0 F6 8
ADDD F6 F8 F2 13 14 16
Functional unit status dest S1 S2 FU for jFU for kFj? Fk?
TimeName Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer No
Mult1 No
Mult2 No
Add Yes Add F6 F8 F2 No No
Divide Yes Div F10 F0 F6 Yes Yes
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
20 FU Add Divide
MULT completes and
writes to F0
147. 150
Scoreboard Example Cycle 21
Instruction j k IssueoperandscompleteResult
LD F6 34+ R2 1 2 3 4
LD F2 45+ R3 5 6 7 8
MULTD F0 F2 F4 6 9 19 20
SUBD F8 F6 F2 7 9 11 12
DIVD F10 F0 F6 8 21
ADDD F6 F8 F2 13 14 16
Functional unit status dest S1 S2 FU for jFU for kFj? Fk?
TimeName Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer No
Mult1 No
Mult2 No
Add Yes Add F6 F8 F2 No No
Divide Yes Div F10 F0 F6 No No
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
21 FU Add Divide
Now DIV reads because
F0 is available
148. 151
Scoreboard Example Cycle 22
Instruction j k IssueoperandscompleteResult
LD F6 34+ R2 1 2 3 4
LD F2 45+ R3 5 6 7 8
MULTD F0 F2 F4 6 9 19 20
SUBD F8 F6 F2 7 9 11 12
DIVD F10 F0 F6 8 21
ADDD F6 F8 F2 13 14 16 22
Functional unit status dest S1 S2 FU for jFU for kFj? Fk?
TimeName Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer No
Mult1 No
Mult2 No
Add No
Divide Yes Div F10 F0 F6 No No
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
21 FU Divide
ADD writes result because
WAR is removed.
149. 152
Scoreboard Example Cycle 61
Instruction j k IssueoperandscompleteResult
LD F6 34+ R2 1 2 3 4
LD F2 45+ R3 5 6 7 8
MULTD F0 F2 F4 6 9 19 20
SUBD F8 F6 F2 7 9 11 12
DIVD F10 F0 F6 8 21 61
ADDD F6 F8 F2 13 14 16 22
Functional unit status dest S1 S2 FU for jFU for kFj? Fk?
TimeName Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer No
Mult1 No
Mult2 No
Add No
Divide Yes Div F10 F0 F6 No No
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
61 FU Divide
DIVD completes execution
(40 cycles)
150. 153
Scoreboard Example Cycle 62
Instruction status Read ExecutionWrite
Instruction j k Issue operandscompleteResult
LD F6 34+ R2 1 2 3 4
LD F2 45+ R3 5 6 7 8
MULTDF0 F2 F4 6 9 19 20
SUBD F8 F6 F2 7 9 11 12
DIVD F10 F0 F6 8 21 61 62
ADDD F6 F8 F2 13 14 16 22
Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?
Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer No
Mult1 No
Mult2 No
Add No
0 Divide No
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
62 FU
Execution is finished
151. 154
An Assessment of
ScoreboardingPro: A scoreboard effectively handles
true data dependencies:
−Minimizes the number of stalls due to true
data dependencies.
Con: Anti dependences and output
dependences (WAR and WAW hazards)
are also handled using stalls:
− Could have been better handled.
153. 156
A More Sophisticated
Approach: Tomasulo’s
Algorithm
Developed for IBM 360/91:
−Goal: To keep the floating point pipeline as
busy as possible.
−This led Tomasulo to try to figure out how to
achieve renaming in hardware!
The descendants of this have flourished!
−Alpha 21264, HP 8000, MIPS 10000, Pentium
III, PowerPC 604, Pentium 4…
154. 157
Key Innovations in Dynamic
Instruction Scheduling
Reservation stations:
− Single entry buffer at the head of each
functional unit has been replaced by a multiple
entry buffer.
Common Data Bus (CDB):
− Connects the output of the functional units to
the reservation stations as well as registers.
Register Tags:
− Tag corresponds to the reservation station entry
number for the instruction producing the result.
155. 158
Reservation Stations
The basic idea:
−An instruction waits in the reservation
station, until its operands become
available.
Helps overcome RAW hazards.
−A reservation station fetches and
buffers an operand as soon as it is
available:
Eliminates the need to get operands from
registers.
156. 159
Tomasulo’s Algorithm
Control & buffers distributed with Function Units (FU)
−In the form of “reservation stations” associated
with every function unit.
−Store operands for issued but pending instructions.
Registers in instructions replaced by values and
others with pointers to reservation stations (RS):
−Achieves register renaming.
−Avoids WAR, WAW hazards without stalling.
−Many more reservation stations than registers
(why?), so can do optimizations that compilers can’t.
157. 160
Tomasulo’s Algorithm
cont…Results passed to FUs from RSs,
− Not through registers, therefore similar to
forwarding.
− Over Common Data Bus (CDB) that
broadcasts results to all FUs.
Load and Stores:
− Treated as FUs with RSs as well.
Integer instructions can go past branches:
− Allows FP ops beyond basic block in FP queue.
159. 162
Three Stages of Tomasulo
Algorithm
1. Issue: Get instruction from Instruction Queue
− Issue instruction only if a matching reservation
station is free (no structural hazard).
−Send registers or the functional unit that would
produce the result (achieves renaming).
−If there is a matching reservation station that is
empty, issue the instruction to the station (otherwise
structural hazard) with the operand values, if they
are currently in the registers (otherwise keep track
of FU that will produce the operand & WAR, WAW
hazard).
160. 163
Three Stages of Tomasulo
Algorithm
2. Execute: Operate on operands (EX)
−If one or more of the operands is not yet available,
monitor the common data bus while waiting for it to
be computed.
− When both operands ready then execute;
if not ready, watch Common Data Bus for result
−RAW hazard are avoided.
3. Write result: Finish execution (WB)
−When result is available, write it on the CDB and
from there into the registers and into the
reservation stations waiting for this result.
− Write on CDB to all awaiting units; mark
reservation station available.
161. 164
Different parts of Tomasulo
Instruction Status Unit :-
− Indicates which of 3 phases the instruction is in
Reservation station Status Unit :-
− Indicates the states of reservation station & each
reservation station has 7 different fields.
1. BUSY:- Indicates weather the reservation station and its
accompanying functional unit is busy or free.
2. OP :- Indicates the type of operation to be perform in the
reservation station.
3. Vj, Vk :- Indicates the value of source operands (hold the
offset field).
4. Qj, Qk :- Indicates the reservation station that will produce
the source operand.
5. A:- Use the hold information for the memory address
calculation for the load and store.
162. 165
Different parts of Tomasulo
Register Result Status Unit :-
− Indicates which reservation station will write each
register, if an active instruction has the register as
its destination.
− The field Qi is set to blank whenever there are no
pending instruction that will write that register.
− The no. of reservation station contains the operation
whose result should be stored into the register.
163. 166
Steps in Execution with Tomasulo Approach
• Write the steps of execution with tomasulo approach. Assume
we have 2 multipliers, 3 adders & 2 loader unit. The following
set of MIPS instruction is going to be executed in a pipelined
system.
• Example:
• LOAD F6, 34(R2)
• LOAD F2, 45(R3)
• MUL FO, F2, F4
• SUB F8, F6, F2
• DIV F10, F0, F6
• ADD F6, F8, F2
• Consider following execution status for above instruction set:-
• 1st
LOAD instruction is completed its write result phase.
• 2nd
LOAD instruction is completed its execute phase.
• Other remaining instruction are in their instruction issue phase.
• Show the status of instruction, reservation station, & register
result status using dynamic scheduling with tomasulo approach.
164. 167
Steps in Execution with Tomasulo Approach
Write the steps of execution with tomasulo approach. Assume
we have 2 multipliers, 3 adders & 2 load unit. The following set
of MIPS instruction is going to be executed in a pipelined
system.
Example:
• LOAD F6, 34(R2)
• LOAD F2, 45(R3)
• MUL FO, F2, F4
• SUB F8, F6, F2
• DIV F10, F0, F6
• ADD F6, F8, F2
Consider following execution status for above instruction set:-
• 1st
two LOAD & SUB instructions are completed their write result phase.
• MUL & DIV instructions are ready to go for their write result phase.
• ADD instruction is in its execution phase.
Show the status of instruction, reservation station, & register
result status using dynamic scheduling with Tomasulo approach.
165. 168
Steps in Execution with Tomasulo Approach
Write the steps of execution with tomasulo approach. Assume
we have 2 multipliers, 3 adders & 2 load unit. The following set
of MIPS instruction is going to be executed in a pipelined
system.
Example:
• LOAD F6, 34(R2)
• LOAD F2, 45(R3)
• MUL FO, F2, F4
• SUB F8, F6, F2
• DIV F10, F0, F6
• ADD F6, F8, F2
Consider following execution status for above instruction set:-
• 1st
two LOAD, MUL, SUB & ADD instructions are completed their write
result phase.
• DIV instructions is ready to write its result .
Show the status of instruction, reservation station & register
result status using dynamic scheduling with tomasulo approach.
167. 170
Tomasulo Example Cycle 0
Instruction status Execution Write
Instruction j k Issue complete Result Busy
LD F6 34+ R2 Load1 No
LD F2 45+ R3 Load2 No
MULTD F0 F2 F4 Load3 No
SUBD F8 F6 F2
DIVD F10 F0 F6
ADDD F6 F8 F2
Reservation Stations S1 S2 RS for j RS for k
Time Name Busy Op Vj Vk Qj Qk
0 Add1 No
0 Add2 No
Add3 No
0 Mult1 No
0 Mult2 No
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
0 FU
Address
168. 171
Instruction status Execution Write
Instruction j k Issue complete Result Busy
LD F6 34+ R2 1 Load1 Yes
LD F2 45+ R3 Load2 No
MULTD F0 F2 F4 Load3 No
SUBD F8 F6 F2
DIVD F10 F0 F6
ADDD F6 F8 F2
Reservation Stations S1 S2 RS for j RS for k
Time Name Busy Op Vj Vk Qj Qk
0 Add1 No
0 Add2 No
Add3 No
0 Mult1 No
0 Mult2 No
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
1 FU Load1
Address
34+R2
Tomasulo Example Cycle 1
169. 172
Instruction status Execution Write
Instruction j k Issue complete Result Busy
LD F6 34+ R2 1 2- Load1 Yes
LD F2 45+ R3 2 Load2 Yes
MULTD F0 F2 F4 Load3 No
SUBD F8 F6 F2 Assume Load takes 2 cycles
DIVD F10 F0 F6
ADDD F6 F8 F2
Reservation Stations S1 S2 RS for j RS for k
Time Name Busy Op Vj Vk Qj Qk
0 Add1 No
0 Add2 No
Add3 No
0 Mult1 No
0 Mult2 No
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
2 FU Load2 Load1
Address
34+R2
45+R3
Tomasulo Example Cycle 2
170. 173
Tomasulo Example Cycle 3
Instruction status Execution Write
Instruction j k Issue complete Result Busy
LD F6 34+ R2 1 2--3 Load1 Yes
LD F2 45+ R3 2 3- Load2 Yes
MULTD F0 F2 F4 3 Load3 No
SUBD F8 F6 F2
DIVD F10 F0 F6
ADDD F6 F8 F2
Reservation Stations S1 S2 RS for j RS for k
Time Name Busy Op Vj Vk Qj Qk
0 Add1 No
0 Add2 No
Add3 No
0 Mult1 Yes Mult R(F4) Load2
0 Mult2 No
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
3 FU Mult1 Load2 Load1
read value
Address
34+R2
45+R3
171. 174
Tomasulo Example Cycle 4
Instruction status Execution Write
Instruction j k Issue complete Result Busy
LD F6 34+ R2 1 2--3 4 Load1 No
LD F2 45+ R3 2 3--4 Load2 Yes
MULTD F0 F2 F4 3 Load3 No
SUBD F8 F6 F2 4
DIVD F10 F0 F6
ADDD F6 F8 F2
Reservation Stations S1 S2 RS for j RS for k
Time Name Busy Op Vj Vk Qj Qk
0 Add1 Yes Sub M(A1) Load2
0 Add2 No
Add3 No
0 Mult1 Yes Mult R(F4) Load2
0 Mult2 No
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
4 FU Mult1 Load2 M(A1) Add1
Address
45+R3
172. 175
Tomasulo Example Cycle 5
Instruction status Execution Write
Instruction j k Issue complete Result Busy
LD F6 34+ R2 1 2--3 4 Load1 No
LD F2 45+ R3 2 3--4 5 Load2 No
MULTD F0 F2 F4 3 Load3 No
SUBD F8 F6 F2 4
DIVD F10 F0 F6 5
ADDD F6 F8 F2
Reservation Stations S1 S2 RS for j RS for k
Time Name Busy Op Vj Vk Qj Qk
2 Add1 Yes Sub M(A1) M(A2)
0 Add2 No
Add3 No
10 Mult1 Yes Mult M(A2) R(F4)
0 Mult2 Yes Div M(A1) Mult1
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
5 FU Mult1 M(A2) M(A1) Add1 Mult2
Address
173. 176
Tomasulo Example Cycle 6
Instruction status Execution Write
Instruction j k Issue complete Result Busy
LD F6 34+ R2 1 2--3 4 Load1 No
LD F2 45+ R3 2 3--4 5 Load2 No
MULTD F0 F2 F4 3 6 -- Load3 No
SUBD F8 F6 F2 4 6 --
DIVD F10 F0 F6 5
ADDD F6 F8 F2 6
Reservation Stations S1 S2 RS for j RS for k
Time Name Busy Op Vj Vk Qj Qk
1 Add1 Yes Sub M(A1) M(A2)
0 Add2 Yes Add M(A2) Add1
Add3 No
9 Mult1 Yes Mult M(A2) R(F4)
0 Mult2 Yes Div M(A1) Mult1
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
6 FU Mult1 M(A2) Add2 Add1 Mult2
Address
174. 177
Tomasulo Example Cycle 7
Instruction status Execution Write
Instruction j k Issue complete Result Busy
LD F6 34+ R2 1 2--3 4 Load1 No
LD F2 45+ R3 2 3--4 5 Load2 No
MULTD F0 F2 F4 3 6 -- Load3 No
SUBD F8 F6 F2 4 6 -- 7
DIVD F10 F0 F6 5
ADDD F6 F8 F2 6
Reservation Stations S1 S2 RS for j RS for k
Time Name Busy Op Vj Vk Qj Qk
0 Add1 Yes Sub M(A1) M(A2)
0 Add2 Yes Add M(A2) Add1
Add3 No
8 Mult1 Yes Mult M(A2) R(F4)
0 Mult2 Yes Div M(A1) Mult1
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
7 FU Mult1 M(A2) Add2 Add1 Mult2
Address
175. 178
Tomasulo Example Cycle 8
Instruction status Execution Write
Instruction j k Issue complete Result Busy
LD F6 34+ R2 1 2--3 4 Load1 No
LD F2 45+ R3 2 3--4 5 Load2 No
MULTD F0 F2 F4 3 6 -- Load3 No
SUBD F8 F6 F2 4 6 -- 7 8
DIVD F10 F0 F6 5
ADDD F6 F8 F2 6
Reservation Stations S1 S2 RS for j RS for k
Time Name Busy Op Vj Vk Qj Qk
0 Add1 No
2 Add2 Yes Add M1-M2 M(A2)
Add3 No
7 Mult1 Yes Mult M(A2) R(F4)
0 Mult2 Yes Div M(A1) Mult1
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
8 FU Mult1 M(A2) Add2 M1-M2 Mult2
Address
176. 179
Tomasulo Example Cycle 9
Instruction status Execution Write
Instruction j k Issue complete Result Busy
LD F6 34+ R2 1 2--3 4 Load1 No
LD F2 45+ R3 2 3--4 5 Load2 No
MULTD F0 F2 F4 3 6 -- Load3 No
SUBD F8 F6 F2 4 6 -- 7 8
DIVD F10 F0 F6 5
ADDD F6 F8 F2 6 9 --
Reservation Stations S1 S2 RS for j RS for k
Time Name Busy Op Vj Vk Qj Qk
0 Add1 No
1 Add2 Yes Add M1-M2 M(A2)
Add3 No
6 Mult1 Yes Mult M(A2) R(F4)
0 Mult2 Yes Div M(A1) Mult1
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
9 FU Mult1 M(A2) Add2 M1-M2 Mult2
Address
177. 180
Tomasulo Example Cycle 10
Instruction status Execution Write
Instruction j k Issue complete Result Busy
LD F6 34+ R2 1 2--3 4 Load1 No
LD F2 45+ R3 2 3--4 5 Load2 No
MULTD F0 F2 F4 3 6 -- Load3 No
SUBD F8 F6 F2 4 6 -- 7 8
DIVD F10 F0 F6 5
ADDD F6 F8 F2 6 9 -- 10
Reservation Stations S1 S2 RS for j RS for k
Time Name Busy Op Vj Vk Qj Qk
0 Add1 No
0 Add2 Yes Add M1-M2 M(A2)
Add3 No
5 Mult1 Yes Mult M(A2) R(F4)
0 Mult2 Yes Div M(A1) Mult1
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
10 FU Mult1 M(A2) Add2 M1-M2 Mult2
Address
178. 181
Tomasulo Example Cycle 11
Instruction status Execution Write
Instruction j k Issue complete Result Busy
LD F6 34+ R2 1 2--3 4 Load1 No
LD F2 45+ R3 2 3--4 5 Load2 No
MULTD F0 F2 F4 3 6 -- Load3 No
SUBD F8 F6 F2 4 6 -- 7 8
DIVD F10 F0 F6 5
ADDD F6 F8 F2 6 9 -- 10 11
Reservation Stations S1 S2 RS for j RS for k
Time Name Busy Op Vj Vk Qj Qk
0 Add1 No
Add2 No
Add3 No
4 Mult1 Yes Mult M(A2) R(F4)
0 Mult2 Yes Div M(A1) Mult1
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
11 FU Mult1 M(A2) M1-M2+M(A2)M1-M2 Mult2
Address
179. 182
Tomasulo Example Cycle 12
Instruction status Execution Write
Instruction j k Issue complete Result Busy
LD F6 34+ R2 1 2--3 4 Load1 No
LD F2 45+ R3 2 3--4 5 Load2 No
MULTD F0 F2 F4 3 6 -- Load3 No
SUBD F8 F6 F2 4 6 -- 7 8
DIVD F10 F0 F6 5
ADDD F6 F8 F2 6 9 -- 10 11
Reservation Stations S1 S2 RS for j RS for k
Time Name Busy Op Vj Vk Qj Qk
0 Add1 No
Add2 No
Add3 No
4 Mult1 Yes Mult M(A2) R(F4)
0 Mult2 Yes Div M(A1) Mult1
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
12 FU Mult1 M(A2) M1-M2+M(A2)M1-M2 Mult2
Address
180. 183
Tomasulo Example Cycle 15
Instruction status Execution Write
Instruction j k Issue complete Result Busy
LD F6 34+ R2 1 2--3 4 Load1 No
LD F2 45+ R3 2 3--4 5 Load2 No
MULTD F0 F2 F4 3 6 -- 15 Load3 No
SUBD F8 F6 F2 4 6 -- 7 8
DIVD F10 F0 F6 5
ADDD F6 F8 F2 6 9 -- 10 11
Reservation Stations S1 S2 RS for j RS for k
Time Name Busy Op Vj Vk Qj Qk
0 Add1 No
Add2 No
Add3 No
0 Mult1 Yes Mult M(A2) R(F4)
0 Mult2 Yes Div M(A1) Mult1
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
15 FU Mult1 M(A2) M1-M2+M(A2)M1-M2 Mult2
Address
181. 184
Tomasulo Example Cycle 16
Instruction status Execution Write
Instruction j k Issue complete Result Busy
LD F6 34+ R2 1 2--3 4 Load1 No
LD F2 45+ R3 2 3--4 5 Load2 No
MULTD F0 F2 F4 3 6 -- 15 16 Load3 No
SUBD F8 F6 F2 4 6 -- 7 8
DIVD F10 F0 F6 5
ADDD F6 F8 F2 6 9 -- 10 11
Reservation Stations S1 S2 RS for j RS for k
Time Name Busy Op Vj Vk Qj Qk
0 Add1 No
Add2 No
Add3 No
Mult1 No
40 Mult2 Yes Div M2*F4 M(A1)
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
16 FU M*F4 M(A2) M1-M2+M(A2) M1-M2 Mult2
Address
182. 185
Tomasulo Example Cycle 56
Instruction status Execution Write
Instruction j k Issue complete Result Busy
LD F6 34+ R2 1 2--3 4 Load1 No
LD F2 45+ R3 2 3--4 5 Load2 No
MULTD F0 F2 F4 3 6 -- 15 16 Load3 No
SUBD F8 F6 F2 4 6 -- 7 8
DIVD F10 F0 F6 5 17 -- 56
ADDD F6 F8 F2 6 9 -- 10 11
Reservation Stations S1 S2 RS for j RS for k
Time Name Busy Op Vj Vk Qj Qk
0 Add1 No
Add2 No
Add3 No
Mult1 No
0 Mult2 Yes Div M*F4 M(A1)
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
56 FU M*F4 M(A2) M1-M2+M(A2)M1-M2 Mult2
Address
183. 186
Tomasulo Example Cycle 57
Instruction status Execution Write
Instruction j k Issue complete Result Busy
LD F6 34+ R2 1 2--3 4 Load1 No
LD F2 45+ R3 2 3--4 5 Load2 No
MULTD F0 F2 F4 3 6 -- 15 16 Load3 No
SUBD F8 F6 F2 4 6 -- 7 8
DIVD F10 F0 F6 5 17 -- 56 57
ADDD F6 F8 F2 6 9 -- 10 11
Reservation Stations S1 S2 RS for j RS for k
Time Name Busy Op Vj Vk Qj Qk
0 Add1 No
Add2 No
Add3 No
Mult1 No
0 Mult2 No
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
57 FU M*F4 M(A2) M1-M2+M(A2)M1-M2 result
Address
185. 188
Tomasulo’s Scheme:
Drawbacks
Performance is limited by CDB:
− CDB connects to multiple functional units
⇒high capacitance, high wiring density
− Number of functional units that can
complete per cycle limited to one!
Multiple CDBs ⇒ more FU logic for parallel
stores.
Imprecise exceptions!
− Effective handling is a major performance
bottleneck.
186. 189
Why Can Tomasulo’s Scheme
Overlap Iterations of Loops?
Register renaming using reservation
stations:
−Avoids the WAR stall that occurred in
the scoreboard.
−Also, multiple iterations use different
physical destinations facilitating
dynamic loop unrolling.
187. 190
Tomasulo’s Scheme Offers
Three Major Advantages
• 1. Distribution of hazard detection logic:
− Distributed reservation stations.
− If multiple instructions wait on a single result,
Instructions can be passed simultaneously by broadcast
on CDB.
− If a centralized register file were used,
Units would have to read their results from registers .
2. Elimination of stalls for WAW and WAR
hazards.
3. Possible to have superscalar execution:
− Because results directly available to FUs, rather
than from registers.
189. 192
A Practice Problem on
Dependence Analysis
Identify all dependences in the following
code.
Transform the code to eliminate the
dependences.
for(i=1;i<1000;i++){
y[i]=x[i]/c;
x[i]=x[i]+c;
z[i]=y[i]+c;
y[i]=c-y[i];
}
191. 194
Two Paths to Higher ILP
Superscalar processors:
−Multiple issue, dynamically scheduled,
speculative execution, branch prediction
−More hardware functionalities and
complexities.
VLIW:
−Let complier take the complexity.
−Simple hardware, smart compiler.
192. 195
Very Long Instruction Word
(VLIW) Processors
Hardware cost and complexity of superscalar
schedulers is a major consideration in
processor design.
− VLIW processors rely on compile time analysis to
identify and bundle together instructions that can
be executed concurrently.
These instructions are packed and dispatched
together,
− Thus the name very long instruction word.
− This concept is employed in the Intel IA64
processors.
193. 196
VLIW Processors
The compiler has complete
responsibility of selecting a set of
instructions:
− These can be concurrently be executed.
VLIW processors have static
instruction issue capability:
− As compared, superscalar processors
have dynamic issue capability.
194. 197
The Basic VLIW
Approach
VLIW processors deploy multiple
independent functional units.
Early VLIW processors operated
lock step:
−There was no hazard detection in
hardware at all.
−A stall in any functional unit causes
the entire pipeline to stall.
195. 198
VLIW Processors
Assume a 4-issue static superscalar
processor:
− During fetch stage, 1 to 4 instructions would
be fetched.
− The group of instructions that could be issued
in a single cycle are called:
An issue packet or a Bundle.
− If an instruction could cause a structural or
data hazard:
It is not issued.
196. 199
VLIW (Very Long Instruction
Word)
One single VLIW instruction:
− separately targets differently functional units.
MultiFlow TRACE, TI C6X, IA-64
add r1,r2,r3
FU FU FU FU
load r4,r5+4 mov r6,r2 mul r7,r8,r9
VLIW InstructionSchematic Explanation for a
Bundle
197. 200
Example of Scheduling by the Compiler
How these operations would be scheduled by
compiler?
1.ADD r1, r2, r3
2.SUB r16, r14, r7
3.LD r2, (r4)
4.LD r14, (r15)
5.MUL r5, r1, r9
6.ADD r9, r10, r11
7.SUB r12, r2, r14
Assume:-
•Number of execution units = 3
•All operations has latency = 2 cycles
•Any execution unit can execute any operation
199. 202
VLIW Processors: Some
Considerations
Issue hardware is simpler.
Compiler has a bigger context from which to
select co-scheduled instructions.
Compilers, however, do not have runtime
information such as cache misses.
− Scheduling is, therefore, inherently conservative.
− Branch and memory prediction is more difficult.
Typical VLIW processors are limited to 4-way
to 8-way parallelism.
200. 203
VLIW Summary
Each “instruction” is very large
−Bundles multiple operations that are independent.
Complier detects hazard, and determines
scheduling.
There is no (or only partial) hardware hazard
detection:
−No dependence check logic for instructions issued
at the same cycle.
Tradeoff instruction space for simple
decoding
−The long instruction word has room for many
operations.
−But have to fill with NOP if enough operations
cannot be found.
201. 204
VLIW vs Superscalar
VLIW - Compiler finds parallelism:
−Superscalar – hardware finds parallelism
VLIW – Simpler hardware:
−Superscalar – More complex hardware
VLIW – less parallelism can be
exploited for a typical program:
−Superscalar – Better performance
202. 205
Superscalar Execution
Scheduling of instructions is determined by a
number of factors:
− True Data Dependency: The result of one
operation is an input to the next.
− Resource constraints: Two operations require the
same resource.
− Branch Dependency: Scheduling instructions
across conditional branch statements cannot be
done deterministically a-priori.
An appropriate number of instructions
issued.
− Superscalar processor of degree m.
203. 206
Superscalar Execution With
Dynamic Scheduling
Multiple instruction issue:
−Very well accommodated with dynamic
instruction scheduling approach.
The issue stage can be:
−Replicated, pipelined, or both.
204. 207
A Superscalar MIPS Processor
Assume two instructions can be
issued per clock cycle:
−One of the instructions can be load,
store, or integer ALU operations.
−The other can be a floating point
operation.
206. 209
Limitations of Scalar Pipelines:
A Reflection
Maximum throughput bounded by one
instruction per cycle.
Inefficient unification of instructions
into one pipeline:
−ALU, MEM stages very diverse eg: FP
Rigid nature of in-order pipeline:
−If a leading instruction is stalled every
subsequent instruction is stalled
208. 211
Solving Problems of Scalar
Pipelines: Modern Processors
Maximum throughput bounded by one
instruction per cycle:
−parallel pipelines (superscalar)
Inefficient unification into a single
pipeline:
−diversified pipelines.
Rigid nature of in order pipeline
−Allow out of ordering or dynamic
instruction scheduling.
209. 212
Difference Between Superscalar
and Super-Pipelined
• Super-Pipelined
– Many pipeline stages need less than half a
clock cycle
– Double internal clock speed gets two tasks per
external clock cycle
• Superscalar
– Allows for parallel execution of independent
instructions