The document summarizes a chapter on pipelining and superscalar techniques from a textbook on advanced computer architecture. It covers various types of pipeline processors like linear, non-linear, instruction and arithmetic pipelines. It discusses concepts like pipeline stages, hazards, forwarding, branching techniques. It also covers superscalar design parameters, dependencies, scheduling and their performance analysis. The document is presented by an assistant professor and contains diagrams to explain different pipelining concepts.
The document discusses different types of parallelism that can be utilized in parallel database systems: I/O parallelism to retrieve relations from multiple disks in parallel, interquery parallelism to run different queries simultaneously, intraquery parallelism to parallelize operations within a single query, and intraoperation parallelism to parallelize individual operations like sort and join. It also covers techniques for partitioning relations across disks and handling skew to balance the workload.
The document summarizes the RISC pipeline architecture. It discusses the five stages of the classic RISC pipeline: instruction fetch, instruction decode, execute, memory access, and writeback. Each stage is involved in processing one instruction at a time through the pipeline. The instruction fetch stage retrieves instructions from the instruction cache. The decode stage decodes the instruction and computes branch targets. The execute stage performs arithmetic and logical operations. The memory access stage handles data memory access. Finally, the writeback stage writes results back to registers. The document also discusses hazards like structural, data, and control hazards that can occur in pipelines.
This document discusses parallel processing and pipelining. It describes different levels and types of parallel processing including job level, task level, inter-instruction level, and intra-instruction level parallelism. It also covers Flynn's classification of parallel computers as SISD, SIMD, MISD, and MIMD based on the number of instruction and data streams. Pipelining is defined as decomposing a process into sub-operations that execute concurrently. The key benefits of pipelining are that multiple computations can progress simultaneously through different pipeline stages.
This document outlines the syllabus for a course on computer organization and architecture. The syllabus covers 10 units: 1) introduction to computers, 2) register transfer and micro-operations, 3) computer arithmetic, 4) programming the basic computer, 5) central processing unit organization, 6) input-output organization, 7) memory organization, 8) parallel processing, 9) vector processing, and 10) multiprocessors. Key topics include Von Neumann architecture, computer generations, instruction execution, registers, buses, arithmetic logic units, assembly language, and memory hierarchies. References for the course are also provided.
This document discusses computer arithmetic and floating point representation. It begins with an introduction to computer arithmetic and covers topics like addition, subtraction, multiplication, division and their algorithms. It then discusses floating point representation which uses scientific notation to represent real numbers. Key aspects covered include single and double precision formats, normalized and denormalized numbers, overflow and underflow, and biased exponent representation. Examples are provided to illustrate floating point addition and multiplication. The document also discusses floating point instructions in MIPS and the need for accurate arithmetic in floating point operations.
Chapter 2 instructions language of the computerBATMUNHMUNHZAYA
The document discusses the MIPS instruction set architecture. It describes the different types of instructions including arithmetic, logical, and conditional instructions. It explains the register-based and memory-based operands, immediate operands, and how instructions are encoded in binary machine code. Key aspects like simplicity, regularity, and optimization for common cases are emphasized in the design of the MIPS ISA.
FellowBuddy.com is an innovative platform that brings students together to share notes, exam papers, study guides, project reports and presentation for upcoming exams.
We connect Students who have an understanding of course material with Students who need help.
Benefits:-
# Students can catch up on notes they missed because of an absence.
# Underachievers can find peer developed notes that break down lecture and study material in a way that they can understand
# Students can earn better grades, save time and study effectively
Our Vision & Mission – Simplifying Students Life
Our Belief – “The great breakthrough in your life comes when you realize it, that you can learn anything you need to learn; to accomplish any goal that you have set for yourself. This means there are no limits on what you can be, have or do.”
Like Us - https://www.facebook.com/FellowBuddycom
The document provides an overview of parallel processing and multiprocessor systems. It discusses Flynn's taxonomy, which classifies computers as SISD, SIMD, MISD, or MIMD based on whether they process single or multiple instructions and data in parallel. The goals of parallel processing are to reduce wall-clock time and solve larger problems. Multiprocessor topologies include uniform memory access (UMA) and non-uniform memory access (NUMA) architectures.
The document discusses different types of parallelism that can be utilized in parallel database systems: I/O parallelism to retrieve relations from multiple disks in parallel, interquery parallelism to run different queries simultaneously, intraquery parallelism to parallelize operations within a single query, and intraoperation parallelism to parallelize individual operations like sort and join. It also covers techniques for partitioning relations across disks and handling skew to balance the workload.
The document summarizes the RISC pipeline architecture. It discusses the five stages of the classic RISC pipeline: instruction fetch, instruction decode, execute, memory access, and writeback. Each stage is involved in processing one instruction at a time through the pipeline. The instruction fetch stage retrieves instructions from the instruction cache. The decode stage decodes the instruction and computes branch targets. The execute stage performs arithmetic and logical operations. The memory access stage handles data memory access. Finally, the writeback stage writes results back to registers. The document also discusses hazards like structural, data, and control hazards that can occur in pipelines.
This document discusses parallel processing and pipelining. It describes different levels and types of parallel processing including job level, task level, inter-instruction level, and intra-instruction level parallelism. It also covers Flynn's classification of parallel computers as SISD, SIMD, MISD, and MIMD based on the number of instruction and data streams. Pipelining is defined as decomposing a process into sub-operations that execute concurrently. The key benefits of pipelining are that multiple computations can progress simultaneously through different pipeline stages.
This document outlines the syllabus for a course on computer organization and architecture. The syllabus covers 10 units: 1) introduction to computers, 2) register transfer and micro-operations, 3) computer arithmetic, 4) programming the basic computer, 5) central processing unit organization, 6) input-output organization, 7) memory organization, 8) parallel processing, 9) vector processing, and 10) multiprocessors. Key topics include Von Neumann architecture, computer generations, instruction execution, registers, buses, arithmetic logic units, assembly language, and memory hierarchies. References for the course are also provided.
This document discusses computer arithmetic and floating point representation. It begins with an introduction to computer arithmetic and covers topics like addition, subtraction, multiplication, division and their algorithms. It then discusses floating point representation which uses scientific notation to represent real numbers. Key aspects covered include single and double precision formats, normalized and denormalized numbers, overflow and underflow, and biased exponent representation. Examples are provided to illustrate floating point addition and multiplication. The document also discusses floating point instructions in MIPS and the need for accurate arithmetic in floating point operations.
Chapter 2 instructions language of the computerBATMUNHMUNHZAYA
The document discusses the MIPS instruction set architecture. It describes the different types of instructions including arithmetic, logical, and conditional instructions. It explains the register-based and memory-based operands, immediate operands, and how instructions are encoded in binary machine code. Key aspects like simplicity, regularity, and optimization for common cases are emphasized in the design of the MIPS ISA.
FellowBuddy.com is an innovative platform that brings students together to share notes, exam papers, study guides, project reports and presentation for upcoming exams.
We connect Students who have an understanding of course material with Students who need help.
Benefits:-
# Students can catch up on notes they missed because of an absence.
# Underachievers can find peer developed notes that break down lecture and study material in a way that they can understand
# Students can earn better grades, save time and study effectively
Our Vision & Mission – Simplifying Students Life
Our Belief – “The great breakthrough in your life comes when you realize it, that you can learn anything you need to learn; to accomplish any goal that you have set for yourself. This means there are no limits on what you can be, have or do.”
Like Us - https://www.facebook.com/FellowBuddycom
The document provides an overview of parallel processing and multiprocessor systems. It discusses Flynn's taxonomy, which classifies computers as SISD, SIMD, MISD, or MIMD based on whether they process single or multiple instructions and data in parallel. The goals of parallel processing are to reduce wall-clock time and solve larger problems. Multiprocessor topologies include uniform memory access (UMA) and non-uniform memory access (NUMA) architectures.
This chapter discusses instruction sets and addressing modes. It covers common addressing modes like immediate, direct, indirect, register, register indirect, displacement, and stack addressing. It provides examples and diagrams to illustrate each addressing mode. The chapter also discusses instruction formats for different architectures like PDP-8, PDP-10, PDP-11, VAX, x86, and ARM. It covers ARM's load/store multiple addressing and thumb instruction set. The chapter concludes with an overview of assemblers and a simple example program.
Pipelining is a technique where multiple instructions are overlapped during execution by dividing the instruction cycle into stages connected in a pipeline structure. There are two main types of pipelines - instruction pipelines which overlap the fetch, decode, and execute phases of instructions to improve throughput, and arithmetic pipelines used for floating point and fixed point operations. Pipeline conflicts can occur due to timing variations, data hazards, branching, interrupts, or data dependency which reduce the pipeline's performance. The main advantages of pipelining are reduced cycle time, increased throughput, and improved reliability, while the main disadvantage is increased complexity, cost, and instruction latency.
Assembly language is a low-level programming language that corresponds directly to a processor's machine language instructions. It uses symbolic codes that are assembled into machine-readable object code. Assembly languages are commonly used when speed, compact code size, or direct hardware interaction is important. Assemblers translate assembly language into binary machine code that can be directly executed by processors.
Single Instruction Multiple Data (SIMD) is an approach to improve performance by replicating data paths rather than control. Vector processors apply the same operation to all elements of a vector in parallel. The ILLIAC IV was an early SIMD computer from 1972 with 64 processing elements. Vector processors store vectors in registers and apply the same instruction to all elements simultaneously. The Cray-1 was an influential vector supercomputer from 1978 that used vector registers and optimized memory access for vectors. Vectorization improves performance by performing the same operation on multiple data elements with a single instruction.
a spline is a flexible strip used to produce a smooth curve through a designated set of points.
Polynomial sections are fitted so that the curve passes through each control point, Resulting curve is said to interpolate the set of control points.
An explicitly parallel program must specify concurrency and interaction between concurrent subtasks.
The former is sometimes also referred to as the control structure and the latter as the communication model.
Topic: I/O Buffering in Operting System
Discussion Points:
-Why we need buffering?
-Problems in buffering
-Types of I/O Devices
-Types of Buffer
-Types of approaches to buffering
-Which type of buffer is used with which type of I/O device and how?
This document discusses linkers and loaders from a programmer's perspective. It covers key concepts like object files, program loading, linking with static and dynamic libraries, symbol resolution, relocation, position independent code, and shared libraries. The main points are that linkers combine object files and resolve symbols, loaders load programs into memory, static libraries are linked at compile time, dynamic libraries allow sharing of code and dynamic loading, and position independent code allows shared libraries to be loaded at any address.
Greibach Normal Form (GNF) is a type of context-free grammar where the right-hand side of each production rule consists of a single terminal symbol followed by zero or more non-terminal symbols. The document discusses GNF, provides an example grammar in GNF, describes two lemmas used to convert grammars to GNF, and shows the procedure and steps to convert an arbitrary context-free grammar into GNF. It also provides examples of converting several grammars to GNF and solving related problems.
The document discusses instruction set architecture (ISA), which is part of computer architecture related to programming. It defines the native data types, instructions, registers, addressing modes, and other low-level aspects of a computer's operation. Well-known ISAs include x86, ARM, MIPS, and RISC. A good ISA lasts through many implementations, supports a variety of uses, and provides convenient functions while permitting efficient implementation. Assembly language is used to program at the level of an ISA's registers, instructions, and execution order.
This document provides an overview of the chapters and content covered in a textbook on computer organization and architecture. The chapters cover digital logic circuits, digital components, data representation, register transfer and microoperations, basic computer organization and design, programming and instruction sets, control units, processor design, pipelining and parallel processing, arithmetic, input/output, and memory organization. Key concepts discussed include logic gates, boolean algebra, combinational and sequential circuits, registers, buses, arithmetic and logic operations, and memory.
The document discusses code generation in compilers. It describes the main tasks of the code generator as instruction selection, register allocation and assignment, and instruction ordering. It then discusses various issues in designing a code generator such as the input and output formats, memory management, different instruction selection and register allocation approaches, and choice of evaluation order. The target machine used is a hypothetical machine with general purpose registers, different addressing modes, and fixed instruction costs. Examples of instruction selection and utilization of addressing modes are provided.
This document discusses common bus systems for transferring data between registers in a computer. It explains that a bus structure uses common lines and control signals to connect one register at a time to transfer information. One method for constructing a common bus system is to use multiplexers to select the source register and place its binary information on the shared bus lines. Memory transfers refer to read and write operations that transfer data between memory and registers over the bus.
Interleaved memory is a design that spreads memory addresses across multiple memory banks to compensate for the relatively slow speed of DRAM. It increases bandwidth and improves performance by allowing different modules to be accessed independently and in parallel by different processing units like a CPU and hard disk. There are two address formats for interleaved memory: low order interleaving which spreads addresses across banks, and high order interleaving which uses high order bits as the module address.
Graphic hardware and software are used to create and edit images. Common hardware includes RAM, CPU, graphics cards, and hard drives which are used to store and process graphical files. Software like Photoshop, IPhoto, and Paint allow users to edit photos with different tools and levels of complexity. File formats like JPG, PNG, and TIFF determine image quality and compression for different uses. While powerful programs and devices provide robust editing, they can also be expensive and complex for beginners.
This document discusses pipelining in microprocessors. Pipelining allows a microprocessor to begin executing the next instruction before the previous one has finished. It works by breaking down the instruction process into stages, like an assembly line, where each stage performs part of the process on individual instructions in parallel. The key steps are to divide the process into subtasks that form pipeline stages, have each stage perform an operation on a set of operands, pass results to the next stage, and feed a new set of inputs to each stage. Pipelining can be classified by the level of processing, configuration, and type of instruction and data.
This document discusses computer architecture concepts such as machine language, assembly language, and instruction set architecture. It then describes different addressing architectures including memory-to-memory, register-to-register, and register-memory. Various addressing modes are also defined, such as implied, immediate, indirect, relative, and indexed addressing modes. Finally, the document briefly discusses stack instructions and the use of a stack pointer register.
This document discusses superscalar and super pipeline approaches to improving processor performance. Superscalar processors execute multiple independent instructions in parallel using multiple pipelines. Super pipelines break pipeline stages into smaller stages to reduce clock period and increase instruction throughput. While superscalar utilizes multiple parallel pipelines, super pipelines perform multiple stages per clock cycle in each pipeline. Super pipelines benefit from higher parallelism but also increase potential stalls from dependencies. Both approaches aim to maximize parallel instruction execution but face limitations from true data and other dependencies.
This chapter discusses computer abstractions and technology. It covers the hardware/software interface and how high-level programs are translated to machine code. The chapter also examines different types of computers like PCs, servers, and embedded systems. It describes how computers use layers of abstraction in both hardware and software. The chapter concludes by discussing performance measures like response time and throughput, and how techniques like parallelism can improve performance within power constraints.
Line Drawing Algorithms - Computer Graphics - NotesOmprakash Chauhan
Straight-line drawing algorithms are based on incremental methods.
In incremental method line starts with a straight point, then some fix incrementable is added to current point to get next point on the line and the same has continued all the end of the line.
Here are the key steps in designing a pipelined processor:
1. Identify the stages in the instruction execution pipeline, such as fetch, decode, execute, memory, writeback.
2. Associate functional units and resources with each pipeline stage. For example, register file access in execute stage.
3. Examine the datapath and control signals to ensure data and resource dependencies flow correctly through the pipeline with no conflicts.
4. Add pipeline registers between stages to break up instruction flow into discrete packets and enable overlapped execution.
5. Design pipeline control logic to assert appropriate control signals in each stage, such as read/write enables for register file.
6. Handle exceptions like hazards properly with
The document discusses linear and nonlinear instruction pipelines. It describes different stages in a linear pipeline like fetch, decode, operand fetch, execute, and write results. It also discusses different types of dependencies like data and instruction dependencies that can occur in pipelines and different solutions to handle them like stalling, forwarding, branch prediction etc. The document further explains the design of nonlinear pipelines using concepts like latency sequence, collision vector, forbidden latencies and provides an example pipeline for multiply and add operations.
This chapter discusses instruction sets and addressing modes. It covers common addressing modes like immediate, direct, indirect, register, register indirect, displacement, and stack addressing. It provides examples and diagrams to illustrate each addressing mode. The chapter also discusses instruction formats for different architectures like PDP-8, PDP-10, PDP-11, VAX, x86, and ARM. It covers ARM's load/store multiple addressing and thumb instruction set. The chapter concludes with an overview of assemblers and a simple example program.
Pipelining is a technique where multiple instructions are overlapped during execution by dividing the instruction cycle into stages connected in a pipeline structure. There are two main types of pipelines - instruction pipelines which overlap the fetch, decode, and execute phases of instructions to improve throughput, and arithmetic pipelines used for floating point and fixed point operations. Pipeline conflicts can occur due to timing variations, data hazards, branching, interrupts, or data dependency which reduce the pipeline's performance. The main advantages of pipelining are reduced cycle time, increased throughput, and improved reliability, while the main disadvantage is increased complexity, cost, and instruction latency.
Assembly language is a low-level programming language that corresponds directly to a processor's machine language instructions. It uses symbolic codes that are assembled into machine-readable object code. Assembly languages are commonly used when speed, compact code size, or direct hardware interaction is important. Assemblers translate assembly language into binary machine code that can be directly executed by processors.
Single Instruction Multiple Data (SIMD) is an approach to improve performance by replicating data paths rather than control. Vector processors apply the same operation to all elements of a vector in parallel. The ILLIAC IV was an early SIMD computer from 1972 with 64 processing elements. Vector processors store vectors in registers and apply the same instruction to all elements simultaneously. The Cray-1 was an influential vector supercomputer from 1978 that used vector registers and optimized memory access for vectors. Vectorization improves performance by performing the same operation on multiple data elements with a single instruction.
a spline is a flexible strip used to produce a smooth curve through a designated set of points.
Polynomial sections are fitted so that the curve passes through each control point, Resulting curve is said to interpolate the set of control points.
An explicitly parallel program must specify concurrency and interaction between concurrent subtasks.
The former is sometimes also referred to as the control structure and the latter as the communication model.
Topic: I/O Buffering in Operting System
Discussion Points:
-Why we need buffering?
-Problems in buffering
-Types of I/O Devices
-Types of Buffer
-Types of approaches to buffering
-Which type of buffer is used with which type of I/O device and how?
This document discusses linkers and loaders from a programmer's perspective. It covers key concepts like object files, program loading, linking with static and dynamic libraries, symbol resolution, relocation, position independent code, and shared libraries. The main points are that linkers combine object files and resolve symbols, loaders load programs into memory, static libraries are linked at compile time, dynamic libraries allow sharing of code and dynamic loading, and position independent code allows shared libraries to be loaded at any address.
Greibach Normal Form (GNF) is a type of context-free grammar where the right-hand side of each production rule consists of a single terminal symbol followed by zero or more non-terminal symbols. The document discusses GNF, provides an example grammar in GNF, describes two lemmas used to convert grammars to GNF, and shows the procedure and steps to convert an arbitrary context-free grammar into GNF. It also provides examples of converting several grammars to GNF and solving related problems.
The document discusses instruction set architecture (ISA), which is part of computer architecture related to programming. It defines the native data types, instructions, registers, addressing modes, and other low-level aspects of a computer's operation. Well-known ISAs include x86, ARM, MIPS, and RISC. A good ISA lasts through many implementations, supports a variety of uses, and provides convenient functions while permitting efficient implementation. Assembly language is used to program at the level of an ISA's registers, instructions, and execution order.
This document provides an overview of the chapters and content covered in a textbook on computer organization and architecture. The chapters cover digital logic circuits, digital components, data representation, register transfer and microoperations, basic computer organization and design, programming and instruction sets, control units, processor design, pipelining and parallel processing, arithmetic, input/output, and memory organization. Key concepts discussed include logic gates, boolean algebra, combinational and sequential circuits, registers, buses, arithmetic and logic operations, and memory.
The document discusses code generation in compilers. It describes the main tasks of the code generator as instruction selection, register allocation and assignment, and instruction ordering. It then discusses various issues in designing a code generator such as the input and output formats, memory management, different instruction selection and register allocation approaches, and choice of evaluation order. The target machine used is a hypothetical machine with general purpose registers, different addressing modes, and fixed instruction costs. Examples of instruction selection and utilization of addressing modes are provided.
This document discusses common bus systems for transferring data between registers in a computer. It explains that a bus structure uses common lines and control signals to connect one register at a time to transfer information. One method for constructing a common bus system is to use multiplexers to select the source register and place its binary information on the shared bus lines. Memory transfers refer to read and write operations that transfer data between memory and registers over the bus.
Interleaved memory is a design that spreads memory addresses across multiple memory banks to compensate for the relatively slow speed of DRAM. It increases bandwidth and improves performance by allowing different modules to be accessed independently and in parallel by different processing units like a CPU and hard disk. There are two address formats for interleaved memory: low order interleaving which spreads addresses across banks, and high order interleaving which uses high order bits as the module address.
Graphic hardware and software are used to create and edit images. Common hardware includes RAM, CPU, graphics cards, and hard drives which are used to store and process graphical files. Software like Photoshop, IPhoto, and Paint allow users to edit photos with different tools and levels of complexity. File formats like JPG, PNG, and TIFF determine image quality and compression for different uses. While powerful programs and devices provide robust editing, they can also be expensive and complex for beginners.
This document discusses pipelining in microprocessors. Pipelining allows a microprocessor to begin executing the next instruction before the previous one has finished. It works by breaking down the instruction process into stages, like an assembly line, where each stage performs part of the process on individual instructions in parallel. The key steps are to divide the process into subtasks that form pipeline stages, have each stage perform an operation on a set of operands, pass results to the next stage, and feed a new set of inputs to each stage. Pipelining can be classified by the level of processing, configuration, and type of instruction and data.
This document discusses computer architecture concepts such as machine language, assembly language, and instruction set architecture. It then describes different addressing architectures including memory-to-memory, register-to-register, and register-memory. Various addressing modes are also defined, such as implied, immediate, indirect, relative, and indexed addressing modes. Finally, the document briefly discusses stack instructions and the use of a stack pointer register.
This document discusses superscalar and super pipeline approaches to improving processor performance. Superscalar processors execute multiple independent instructions in parallel using multiple pipelines. Super pipelines break pipeline stages into smaller stages to reduce clock period and increase instruction throughput. While superscalar utilizes multiple parallel pipelines, super pipelines perform multiple stages per clock cycle in each pipeline. Super pipelines benefit from higher parallelism but also increase potential stalls from dependencies. Both approaches aim to maximize parallel instruction execution but face limitations from true data and other dependencies.
This chapter discusses computer abstractions and technology. It covers the hardware/software interface and how high-level programs are translated to machine code. The chapter also examines different types of computers like PCs, servers, and embedded systems. It describes how computers use layers of abstraction in both hardware and software. The chapter concludes by discussing performance measures like response time and throughput, and how techniques like parallelism can improve performance within power constraints.
Line Drawing Algorithms - Computer Graphics - NotesOmprakash Chauhan
Straight-line drawing algorithms are based on incremental methods.
In incremental method line starts with a straight point, then some fix incrementable is added to current point to get next point on the line and the same has continued all the end of the line.
Here are the key steps in designing a pipelined processor:
1. Identify the stages in the instruction execution pipeline, such as fetch, decode, execute, memory, writeback.
2. Associate functional units and resources with each pipeline stage. For example, register file access in execute stage.
3. Examine the datapath and control signals to ensure data and resource dependencies flow correctly through the pipeline with no conflicts.
4. Add pipeline registers between stages to break up instruction flow into discrete packets and enable overlapped execution.
5. Design pipeline control logic to assert appropriate control signals in each stage, such as read/write enables for register file.
6. Handle exceptions like hazards properly with
The document discusses linear and nonlinear instruction pipelines. It describes different stages in a linear pipeline like fetch, decode, operand fetch, execute, and write results. It also discusses different types of dependencies like data and instruction dependencies that can occur in pipelines and different solutions to handle them like stalling, forwarding, branch prediction etc. The document further explains the design of nonlinear pipelines using concepts like latency sequence, collision vector, forbidden latencies and provides an example pipeline for multiply and add operations.
This document discusses pipelining in microprocessors. It describes how pipelining works by dividing instruction processing into stages - fetch, decode, execute, memory, and write back. This allows subsequent instructions to begin processing before previous instructions have finished, improving processor efficiency. The document provides estimated timing for each stage and notes advantages like quicker execution for large programs, while disadvantages include added hardware and potential pipeline hazards disrupting smooth execution. It then gives examples of how four instructions would progress through each stage in a pipelined versus linear fashion.
Pipelining is an speed up technique where multiple instructions are overlapped in execution on a processor. It is an important topic in Computer Architecture.
This slide try to relate the problem with real life scenario for easily understanding the concept and show the major inner mechanism.
The document discusses instruction pipelining in CPUs. It explains that instruction pipelining achieves greater CPU performance by overlapping the execution of multiple instructions. It describes the different stages in a basic two-stage pipeline as fetch and execute. It then discusses how further dividing the pipeline into more stages, such as six stages for fetch, decode, calculate, fetch operands, execute, and writeback, can provide even higher performance. However, it notes conditional branches can reduce efficiency since the next instruction is unknown until the branch is resolved. Various techniques to handle branches like branch prediction, prefetching the target, and delayed branches are described to improve pipeline performance.
A fast graphic api for non-linear machine learningPeng Cheng
The document discusses a machine learning pipeline framework that uses graphic expressions and core operators to build and visualize machine learning workflows. It introduces common pipeline components like data, algorithms, and models. It then explains the core operators like feeds into/merge right and depends on/merge left that connect components, and provides examples of how to use them. Shorthand operators like rebase and commit are also defined that build upon the core operators.
Slide ini dapat dipakai sebagai pembuka pelajaran mengenai Barisan Aritmetika. dengan bahasa pengantar bahasa inggri, slide ini menampilkan masalah awal yang dapat membimbing siswa untuk menemukan sendiri rumus barisan aritmetika
Improving Test Team Throughput via Architecture by Dustin WilliamsQA or the Highway
A lot of modern testing teams are built from people with some automation experience, developers, and people who think code is something used to open a safe. These diverse backgrounds bring a diverse set of ideas, but don’t always find optimal division of work. With some fairly small changes in automated test design, we can leverage the best skills of all team members to not only improve throughput, but to end up with a better overall product. These design principles help isolate truly challenging code problems and help separate the concerns of test structure and test execution. If your team has ever said (with sad faces) “We’re still automating that”, then come discover how tomorrow you can exclaim “That’s Done!”
This document discusses the history and design of superscalar processors. It covers early prototypes from the 1980s, the introduction of commercial superscalar RISC and CISC processors in the late 1980s and early 1990s, and the tasks of superscalar processing including parallel decoding, instruction execution, and preserving sequential consistency. It also describes techniques like reservation stations, operand fetching policies, and approaches to instruction dispatching.
This is a re-post of old sharing. It was shared in 2007 on scribd (https://www.scribd.com/doc/7346/Reyes-and-Shader-Pipeline), about my some nearly 1st experience and study of Pixar's Renderman Reyes and shader pipeline.
The document discusses different aspects of computer organization to improve performance. It describes how a data path works by feeding two operands to an ALU and storing the output in a register. Modern processors implement features like pipelining and instruction-level parallelism to maximize instruction throughput. Pipelining involves splitting instruction processing across multiple stages to allow concurrent execution. Superscalar architectures contain multiple pipelines that can process multiple instructions simultaneously as long as they do not conflict over resources.
High Throughput, High Content Screening - Automating the PipelineRajarshi Guha
1) High throughput screening and high content screening can be combined into an integrated automated pipeline to obtain rich data at high speeds.
2) A proposed "wells to cells" workflow involves sequential high throughput screening followed by selective high content imaging of interesting wells.
3) Key challenges include deciding which wells receive further analysis and integrating data across multiple screening platforms.
This document discusses multivector and SIMD computers. It covers vector processing principles including vector instruction types like vector-vector, vector-scalar, and vector-memory instructions. It also discusses compound vector operations, vector loops and chaining. Finally, it discusses SIMD computer implementation models like distributed and shared memory, and SIMD instruction types.
This document presents a project presentation on pipelined computation. Pipelined computation involves dividing a problem into sequential tasks or stages. There are two types of pipelined applications: those with multiple complete problem instances, and those where pipeline stages overlap by passing data forward before previous stages finish. The document discusses how pipelining can provide speedup, and describes a distributed data system (DDS) architecture for pipelining computation between processes. It provides examples of pipelined addition and frequency filtering.
The document discusses pipelining in computer processors. It describes how pipelining can increase throughput by overlapping the execution of multiple instructions. It discusses the basic pipeline stages for a RISC instruction set, including fetch, decode, execute, memory access, and writeback. It also describes several types of pipeline hazards that can occur, such as structural hazards caused by resource conflicts, data hazards when instructions depend on previous results, and control hazards with branches. Forwarding techniques are presented to help address data hazards.
1) Branch instructions frequently affect pipeline performance as they are common in programs and can cause stalls.
2) Instruction fetch units try to reduce penalties for unconditional branches by predicting the target address, but conditional branches require waiting for the outcome.
3) Delayed branching aims to reduce penalties by moving instructions into branch delay slots, but only partially solves the problem.
4) Dynamic branch prediction using branch history tables can correctly predict over 90% of branches, avoiding pipeline stalls.
The document discusses pipelining in computer processors. It explains that pipelining allows for overlapping execution of multiple instructions to improve processor throughput. An analogy is drawn to an assembly line in laundry - non-pipelined execution is like completing an entire load sequentially, while pipelined is like having different stages of multiple loads occurring in parallel. Pipelining is achieved by breaking instruction execution into discrete stages, such as fetch, decode, execute, memory, and writeback. This allows new instructions to enter the pipeline before previous ones have finished, improving instruction completion rate.
The document discusses instruction pipelining in computers. It begins with an analogy of pipelining laundry tasks to increase throughput. It then explains the concept of dividing instruction execution into stages (fetch, decode, execute, write) and executing instructions in parallel by having different stages work on different instructions. This allows higher instruction throughput. However, hazards like data dependencies between instructions, branches, cache misses can cause the pipeline to stall, reducing performance. Various techniques are discussed to handle hazards and maximize the benefits of pipelining.
Pipeline Hazards can be classified into three types: structural hazards caused by hardware resource conflicts, data hazards caused when an instruction depends on the results of a previous instruction, and control hazards from conditional branches. Structural hazards arise from limited hardware resources like register files and memory ports. Data hazards include RAW, WAW, and WAR and are resolved by stalling or forwarding. Forwarding minimizes stalls by directly connecting new values to the next stage.
This document is a chapter summary for "CSE539: Advanced Computer Architecture" taught by Sumit Mittu, Assistant Professor at Lovely Professional University. The chapter discusses parallel computer models and covers topics like the evolution of computing through generations of computers, parallel computer classifications like multiprocessors and multicomputers, and theoretical parallel models like PRAM. It provides examples of parallel systems from each generation and model. The goal is to introduce students to parallel architectures and taxonomy.
The document provides an introduction to data envelopment analysis (DEA). DEA is a method used to evaluate the efficiency of decision making units (DMUs) that may have multiple inputs and outputs. It determines the relative efficiency of each DMU by solving a series of linear programming problems. The document discusses the constant returns to scale (CRS) and variable returns to scale (VRS) models, and provides an example to formulate a DEA problem using the CRS assumption to calculate efficiency scores for four mobile phones as DMUs based on price, storage space, camera quality, and looks.
1) The document discusses production function analysis using stochastic frontier models like the translog and Cobb-Douglas functions.
2) It explains the specifications of the translog and Cobb-Douglas production functions and how they are used to estimate production elasticities and returns to scale.
3) The stochastic frontier approach models production as equal to the deterministic production function plus noise, minus inefficiency, allowing estimation of technical efficiency for each firm.
PPCE 1324.pptx process planning and cost estimationAakashR49
The document discusses process planning and cost estimation. It covers topics like the definition of process planning, the objectives and details required for process planning like drawings, materials, and manufacturing processes. It also discusses geometric tolerances, unilateral and bilateral tolerances. The document describes manual and computer-aided process planning approaches. It provides steps in process planning and discusses selection of production equipment and tools based on the volume of production and part specifications.
The document discusses various topics related to project management fundamentals including project cash flow, time value of money, project selection criteria, project scheduling, critical path method (CPM), Gantt charts, and staffing & rescheduling. It provides definitions and explanations of key concepts such as cost types, discount rates, numeric and non-numeric selection models, precedence relationships, critical paths, slack times, and resource smoothing. Examples are given to illustrate how to construct Gantt charts from CPM/PERT analysis and optimize schedules through staff reallocation.
The document discusses the Critical Path Method (CPM) and Program Evaluation and Review Technique (PERT) for project scheduling. CPM determines the minimum project duration when activity times are known with certainty, while PERT estimates the probability of completing on time when activity times are uncertain. Both methods represent projects as networks and identify critical paths that must be followed to complete on schedule.
Lec1 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- IntroHsien-Hsin Sean Lee, Ph.D.
This document provides an overview of ECE2030 Introduction to Computer Engineering course. It introduces the instructor, Prof. Hsien-Hsin Sean Lee, and lists course details such as materials, assignments, exams, grading policy. It outlines the course objectives which include digital design principles like number systems, Boolean algebra, combinational and sequential logic. The focus will be on levels below system architecture like microarchitecture, gates and transistor levels.
Topology Optimization
Topology optimization is concerned with material distribution and how the members within a structure are connected. It treats the “equivalent density” of each element as a design variable.
The solver calculates an equivalent density for each element, where 1 is equivalent to 100% material, while 0 is equivalent to no material in the element. The solver then seeks to assign elements that have a low stress value a lower equivalent density before analyzing the effect on the remaining structure. In this way extraneous elements tend towards a density of 0, with the optimum design tending towards 1. As a designer, you will need to exercise your judgment. For example, you may decide that you will omit material from all (finite) elements whose density is less than 0.3 (or 30%). Using an iso-plot of element densities helps to visualize the “remaining” structure as elements with a density below this threshold can be masked leaving behind the optimum design. Then you will need to take this geometry back to your CAD modeler, smooth it out (that is, use geometrically regular edges or surfaces, etc.) and re-evaluate the design for stresses, displacements, frequencies etc..
This document outlines the schedule and content for an industrial engineering and supply chain management workshop. The workshop covers topics like production strategy, digital twins, product analysis, lean production, robotics, and includes presentations by student teams. The schedule includes sessions on workshop introduction, production strategy, recycling technology, product analysis, team implementation, robotics, and oral presentations. Students are instructed to bring laptops and present deliverables at the end of the workshop, including a commercial video, presentation, and defense of their project.
This document discusses feature engineering techniques for data analysis. It covers feature selection, construction, and engineering. Specific techniques discussed include feature imputation, handling outliers, binning, log transforms, one-hot encoding, grouping, splitting, scaling, and extracting date features. The document provides examples and explanations of these techniques to transform raw data into more useful features for machine learning models.
Computer aided process design and simulation (Cheg.pptxPaulosMekuria
knowledge-based system for conceptual design
3. Aspen Process Economic Analyzer: economic evaluation
of process alternatives, sensitivity analysis, optimization.
4. Aspen Batch: batch process design and scheduling.
5. Aspen Custom Modeler: object-oriented environment for
rigorous modeling of non-standard unit operations.
6. Aspen Process Optimization: steady state optimization
and dynamic optimization of processes.
7. Aspen PIMS: plant information management system.
8. Aspen Petroleum Supply Chain: supply chain modeling.
9. Aspen One: plant-wide real-time optimization.
10. Aspen InfoPlus.21: plant information management.
This document discusses computer architecture performance, including metrics like execution time, throughput, and instructions per cycle (IPC). It provides examples of calculating the cycles per instruction (CPI) for different instruction types and evaluating potential design changes based on their impact on CPI and overall performance. The principles of locality and Amdahl's Law, which states that speedups from parallelism are limited by the serial fraction of a program, are also covered.
The document discusses the topics covered in a database technologies course, including relational algebra operations. It provides examples and explanations of relational algebra concepts like selection, projection, join, union, difference, and cartesian product. It also discusses limitations of relational algebra in expressing complex queries involving transitive closure. The document contains practice questions related to relational algebra operations at the end.
The document discusses computer integrated manufacturing. It defines CAM as using computer systems to plan, manage, and control manufacturing operations through direct or indirect interface with production processes. The document outlines the benefits of CAM such as increased productivity, flexibility, communication, reliability, and reduced costs and personnel requirements. It also discusses topics like group technology, computer-aided process planning, manufacturing resource planning, capacity requirements planning, and Just in Time manufacturing.
This document provides information about a computer organization and design course. It includes details about the instructor, tutors, grade determination, textbook, and course content. The course will cover major computer system components, CPU design, techniques for improving performance and energy efficiency, and multiprocessor architecture. It aims to teach how computer hardware and software interact and how design choices affect performance.
Chapter 10_Project Scheduling Using Critical Path Method (CPM).pdfssuser54fc7a
Project Scheduling Basics: The document introduces the basics of project scheduling, including network-based scheduling methods like Gantt charts and Critical Path Method (CPM).
Scheduling Techniques: It explains different scheduling techniques such as bar charts for planning and control, and Gantt charts for precise activity sequencing.
Critical Path Method (CPM): The document details the CPM, which helps identify the critical set of activities, calculate early and late start times, and determine the minimum project duration.
Float Calculation: It covers the concept of float in project scheduling, which is the amount of time an activity can be delayed without affecting the overall project timeline. Different types of float are discussed, including total, free, and interfering float.
The document discusses automated university timetabling, which involves scheduling teachers, classes, timeslots, and rooms while satisfying various constraints. It covers key concepts like hard and soft constraints, and feasible timetables. Common techniques for timetabling include constructive heuristics to first find a feasible solution, then using local search methods like neighborhood search for optimization. The complexity of evaluating constraints can range from linear to quadratic time depending on the approach.
solve 6 , quiz 2link of book httpwww.irccyn.ec-nantes.fr~mart.pdfarihantcomputersddn
solve 6 , quiz 2
link of book :http://www.irccyn.ec-
nantes.fr/~martinet/Mobrob/FundamentalsofVehicleDynamics.pdf 122% Document Elements
Tables charts Smart Art Review A Home Layout Paragraph Style Cambria (Body) 12 Aa A
AaBbc AaBbCcDdE Normal No spacing Heading 1 Heading 2 Tit Problem 6 Propose a currently
non-commercially available car mechatronics system. Describe the sensors, ac uators and control
system you ntend to use. Explain the purpose of this innovation and specify what the added value
is for the user [i.e why would anyone want to buy it?]. Print Layout View Sec 1 Pages: rds: 32 of
45 Search in Document Aa Text Box Shape Picture Themes 22% O
Solution
COMPUTER AIDED DESIGN OF MECHATRONIC SYSTEMS I Use of advanced computer
tools in the product development process is necessary to meet the market needs for fast product
development. I These tools allows for the verification of the design in all phases of the
development process. I Virtual prototypes, I The main drawback of using the computer tools is
that they are mostly limited to specific domains. The main requirement to the computer tools for
mechatronic system design I to apply to the systems from various domains. I to simultaneously
use different system representations (block diagram, physical model, bond graph)
Matuško&Koloni 2 of 43 MECHATRONICS SYSTEMS 2014/2015 CLASSICAL (NON-
MECHATRONIC) OPTIMIZATION I Classical approach to the optimization during the design
process assumes that the parameters of the process, sensors, actuators are known and constant
and only controller parameters are optimized. CONTROLLER PROCESS INTERFACE
SENSOR OPTIMIZATION + - yR(t) y(t) Matuško&Koloni 3 of 43 MECHATRONICS
SYSTEMS 2014/2015 OPTIMIZATION IN MECHATRONIC SYSTEM I Since the main
characteristic of the mechatronic design approach is simultaneous design of all system
components, the optimization process is performed, not only on the controller parameters but
also on the parameters of the other parts of the system. CONTROLLER PROCESS
INTERFACE SENSOR OPTIMIZATION + - yR(t) y(t) Matuško&Koloni 4 of 43
MECHATRONICS SYSTEMS 2014/2015 MODELS OF MECHATRONIC SYSTEMS I
Physical models, I Bond graphs I Control oriented models I Block diagrams, I Bode diagrams, I
Nyquist diagram, I Sate space models. I Time domain models I Animation I C-code
Matuško&Koloni 5 of 43 MECHATRONICS SYSTEMS 2014/2015 PHYSICAL MODELS I
Physical models describe the system behavior both on functional and structural level, I Model
parameters are real physical parameters, I Generally, physical models are not suitable for the
control system design, I Building such models requires a deep insight into a physical behavior of
the system. Matuško&Koloni 6 of 43 MECHATRONICS SYSTEMS 2014/2015 BOND
GRAPHS I Universal representation of the system behavior, independent of the system nature. I
Based on power flow analysis of the system. I Two quantities that define the system behavior
and power flow within the system (and with its environment).
A lecture slide on the Fetch and Execute Cycle and Machine Cycle Timing Diagrams as outlined from the book Microprocessors and MIcrocomputers by John Uffenbeck
The document is a lecture on various data models for databases, including the object oriented model, network model, hierarchical model, and relational model. It provides details on the concepts, structures, relationships, and query languages associated with each model. It also discusses mapping models to files and includes examples to illustrate concepts. The lecture is presented by Sumit Mittu and includes 41 slides.
This document contains lecture notes on database design and the entity-relationship (E-R) model. It introduces key concepts of the E-R model like entities, attributes, relationships and relationship types. It describes the database design process and how the E-R model is used to conceptualize the real-world domain. Constraints like cardinalities and participation are explained. The document also covers E-R diagrams, mapping the E-R schema to relations and common design issues. The lecture notes are for an introductory database management systems course taught by Sumit Mittu.
This document provides instructions for installing Oracle 12c Enterprise database software. It outlines downloading the installation files from Oracle.com, extracting the files to a local folder, and running the setup file to start the installation wizard. The wizard will guide the user through the installation process, including selecting installation options and confirming or modifying configuration settings. Once complete, a root configuration script must be run and the installation will be finished.
This document contains notes from a database management systems course taught by Sumit Mittu. It discusses E.F. Codd's 12 rules for relational database management systems and the concepts of database normalization. The notes cover the different normal forms including 1NF, 2NF, 3NF, BCNF, 4NF and 5NF. It also briefly discusses denormalization and provides examples to illustrate database normalization concepts.
This document discusses software for parallel programming, including parallel programming models, languages, compilers, and environments. It covers shared memory and message passing models, as well as data parallel programming. It also discusses language features that support parallelism, parallel language constructs, optimizing compilers for parallelism, and integrated parallel programming environments. Key topics are parallel debugging, performance monitoring, and program visualization tools. Representative parallel programming environments including Cray, Intel Paragon, and CM-5 software are also summarized.
This document discusses dataflow architectures and is divided into several sections. It begins by covering the evolution of dataflow computers and describing dataflow graphs. It then distinguishes between static and dynamic dataflow computers, describing examples of each. The document outlines pure dataflow machines such as the TTDA as well as explicit token store machines like the Monsoon. Finally, it discusses hybrid and unified architectures that combine aspects of von Neumann and dataflow models, and compares dataflow and control flow processing.
This document provides an overview of Chapter 7 - Multiprocessors and Multicomputers from the book "Advanced Computer Architecture - Parallelism, Scalability, Programmability" by Hwang and Jotwani. The chapter covers multiprocessor system interconnects including bus-based, crossbar, and multistage interconnects. It discusses cache coherence and synchronization mechanisms such as snoopy protocols and directory-based protocols. Generations of multicomputers and message passing schemes like store-and-forward routing and wormhole routing are also summarized.
The document discusses performance evaluation of parallel computers. It defines key metrics like parallel runtime, speedup and efficiency used to evaluate parallel algorithms. Speedup is the ratio of sequential to parallel runtime and measures how faster a program runs in parallel. Efficiency measures processor utilization. The document also discusses performance measures, benchmarks, sources of parallel overhead, and performance models like Amdahl's law, Gustafson's law and Sun & Ni's law that define relationships between speedup, processors and problem size. It concludes with the scalability metric and isoefficiency function to measure a system's ability to efficiently use more processors by increasing problem size.
This presentation includes basic of PCOS their pathology and treatment and also Ayurveda correlation of PCOS and Ayurvedic line of treatment mentioned in classics.
it describes the bony anatomy including the femoral head , acetabulum, labrum . also discusses the capsule , ligaments . muscle that act on the hip joint and the range of motion are outlined. factors affecting hip joint stability and weight transmission through the joint are summarized.
Thinking of getting a dog? Be aware that breeds like Pit Bulls, Rottweilers, and German Shepherds can be loyal and dangerous. Proper training and socialization are crucial to preventing aggressive behaviors. Ensure safety by understanding their needs and always supervising interactions. Stay safe, and enjoy your furry friends!
A review of the growth of the Israel Genealogy Research Association Database Collection for the last 12 months. Our collection is now passed the 3 million mark and still growing. See which archives have contributed the most. See the different types of records we have, and which years have had records added. You can also see what we have for the future.
Strategies for Effective Upskilling is a presentation by Chinwendu Peace in a Your Skill Boost Masterclass organisation by the Excellence Foundation for South Sudan on 08th and 09th June 2024 from 1 PM to 3 PM on each day.
This slide is special for master students (MIBS & MIFB) in UUM. Also useful for readers who are interested in the topic of contemporary Islamic banking.
How to Build a Module in Odoo 17 Using the Scaffold MethodCeline George
Odoo provides an option for creating a module by using a single line command. By using this command the user can make a whole structure of a module. It is very easy for a beginner to make a module. There is no need to make each file manually. This slide will show how to create a module using the scaffold method.
Macroeconomics- Movie Location
This will be used as part of your Personal Professional Portfolio once graded.
Objective:
Prepare a presentation or a paper using research, basic comparative analysis, data organization and application of economic information. You will make an informed assessment of an economic climate outside of the United States to accomplish an entertainment industry objective.
Executive Directors Chat Leveraging AI for Diversity, Equity, and InclusionTechSoup
Let’s explore the intersection of technology and equity in the final session of our DEI series. Discover how AI tools, like ChatGPT, can be used to support and enhance your nonprofit's DEI initiatives. Participants will gain insights into practical AI applications and get tips for leveraging technology to advance their DEI goals.
MATATAG CURRICULUM: ASSESSING THE READINESS OF ELEM. PUBLIC SCHOOL TEACHERS I...NelTorrente
In this research, it concludes that while the readiness of teachers in Caloocan City to implement the MATATAG Curriculum is generally positive, targeted efforts in professional development, resource distribution, support networks, and comprehensive preparation can address the existing gaps and ensure successful curriculum implementation.
1. CSE539: Advanced Computer Architecture
Chapter 6
Pipelining and Superscalar
Techniques
Book: “Advanced Computer Architecture – Parallelism, Scalability, Programmability”, Hwang & Jotwani
Sumit Mittu
Assistant Professor, CSE/IT
Lovely Professional University
sumit.12735@lpu.co.in
2. In this chapter…
•
•
•
•
•
Linear Pipeline Processors
Non-linear Pipeline Processors
Instruction Pipeline Design
Arithmetic Pipeline Design
Superscalar Pipeline Design
Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University
2
3. LINEAR PIPELINE PROCESSORS
• Linear Pipeline Processor
o (Definition)
• Models of Linear Pipeline
o Synchronous Model
o Asynchronous Model
o (Corresponding reservation tables)
• Clocking and Timing Control
o
o
o
o
o
Clock Cycle
Pipeline Frequency
Clock skewing
Flow-through delay
Speedup, Efficiency and Throughput
• Optimal number of Stages and Performance-Cost Ratio (PCR)
Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University
3
6. NON-LINEAR PIPELINE PROCESSORS
• Dynamic Pipeline
o Static v/s Dynamic Pipeline
o Streamline connection, feed-forward connection and feedback connection
• Reservation and Latency Analysis
o Reservation tables
o Evaluation time
• Latency Analysis
o
o
o
o
Latency
Collision
Forbidden latencies
Latency Sequence, Latency Cycle and Average Latency
Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University
6
10. INSTRUCTION PIPELINE DESIGN
• Instruction Execution Phases
o E.g. Fetch, Decode, Issue, Execute, Write-back
o In-order Instruction issuing and Reordered Instruction issuing
• E.g. X = Y + Z , A = B x C
• Mechanisms/Design Issues for Instruction Pipelining
o
o
o
o
Pre-fetch Buffers
Multiple Functional Units
Internal Data Forwarding
Hazard Avoidance
• Dynamic Scheduling
• Branch Handling Techniques
Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University
10
12. INSTRUCTION PIPELINE DESIGN
•
•
•
•
•
Fetch: fetches instructions from memory; ideally one per cycle
Decode: reveals instruction operations to be performed and identifies the resources needed
Issue: reserves the resources and reads the operands from registers
Execute: actual processing of operations as indicated by instruction
Write Back: writing results into the registers
Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University
12
15. INSTRUCTION PIPELINE DESIGN
Mechanisms/Design Issues of Instruction Pipeline
• Pre-fetch Buffers
o Sequential Buffers
o Target Buffers
o Loop Buffers
Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University
15
16. INSTRUCTION PIPELINE DESIGN
Mechanisms/Design Issues of Instruction Pipeline
• Multiple Functional Units
o Reservation Station and Tags
o Slow-station as Bottleneck stage
• Subdivision of Pipeline Bottleneck stage
• Replication of Pipeline Bottleneck stage
• (Example to be discussed)
Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University
16
18. INSTRUCTION PIPELINE DESIGN
Mechanisms/Design Issues of Instruction Pipeline
• Internal Forwarding and Register Tagging
o Internal Forwarding:
• A “short-circuit” technique to replace unnecessary memory accesses by register-register
transfers in a sequence of fetch-arithmetic-store operations
o Register Tagging:
• Use of tagged registers , buffers and reservation stations, for exploiting concurrent activities
among multiple arithmetic units
o Store-Fetch Forwarding
• (M R1, R2 M) replaced by (M R1, R2 R1)
o Fetch-Fetch Forwarding
• (R1 M, R2 M) replaced by (R1 M, R2 R1)
o Store-Store Overwriting
• (M R1, M R2) replaced by (M R2)
Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University
18
19. INSTRUCTION PIPELINE DESIGN
Mechanisms/Design Issues of Instruction Pipeline
• Internal Forwarding and Register Tagging
Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University
19
20. INSTRUCTION PIPELINE DESIGN
Mechanisms/Design Issues of Instruction Pipeline
• Internal Forwarding and Register Tagging
Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University
20
21. INSTRUCTION PIPELINE DESIGN
Mechanisms/Design Issues of Instruction Pipeline
• Hazard Detection and Avoidance
o
o
o
o
Domain or Input Set of an instruction
Range or Output Set of an instruction
Data Hazards: RAW, WAR and WAW
Resolution using Register Renaming approach
Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University
21
22. INSTRUCTION PIPELINE DESIGN
Dynamic Instruction Scheduling
• Idea of Static Scheduling
o Compiler based scheduling strategy to resolve Interlocking among instructions
• Dynamic Scheduling
o Tomasulo’s Algorithm (Register-Tagging Scheme)
• Hardware based dependence-resolution
o Scoreboarding Technique
• Scoreboard: the centralized control unit
• A kind of data-driven mechanism
Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University
22
25. INSTRUCTION PIPELINE DESIGN
Branch Handling Techniques
• Branch Taken, Branch Target, Delay Slot
• Effect of Branching
o Parameters:
• k:
No. of stages in the pipeline
• n:
Total no. of instructions or tasks
• p:
Percentage of Brach instructions over n
• q:
Percentage of successful branch instructions (branch taken) over p.
• b:
Delay Slot
• τ:
Pipeline Cycle Time
o Branch Penalty = q of (p of n) * bτ = pqnbτ
o Effective Execution Time:
• Teff = [k + (n-1)] τ + pqnbτ = [k + (n-1) + pqnb]τ
Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University
25
26. INSTRUCTION PIPELINE DESIGN
Branch Handling Techniques
• Effect of Branching
o Effective Throughput:
• Heff = n/Teff
• Heff = n / {[k + (n-1) + pqnb]τ} = nf / [k + (n-1) + pqnb]
• As nInfinity and b = k-1
o H*eff = f / [pq(k-1)+1]
• If p=0 and q=0 (no branching occurs)
o H**eff = f = 1/τ
o Performance Degradation Factor
• D = 1 – H*eff / f = pq(k-1) / [pq(k-1)+1]
Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University
26
28. INSTRUCTION PIPELINE DESIGN
Branch Handling Techniques
• Branch Prediction
o Static Branch Prediction: based on branch code types
o Dynamic Branch prediction: based on recent branch history
• Strategy 1: Predict the branch direction based on information found at decode stage.
• Strategy 2: Use a cache to store target addresses at effective address calculation stage.
• Strategy 3: Use a cache to store target instructions at fetch stage
o Brach Target Buffer Organization
• Delayed Branches
o A delayed branch of d cycles allows at most d-1 useful instructions to be executed following the
branch taken.
o Execution of these instructions should be independent of branch instruction to achieve a zero
branch penalty
Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University
28
31. ARITHMETIC PIPELINE DESIGN
Computer Arithmetic Operations
• Finite-precision arithmetic
• Overflow and Underflow
• Fixed-Point operations
o Notations:
• Signed-magnitude, one’s complement and two-complement notation
o Operations:
• Addition:
(n bit, n bit) (n bit) Sum, 1 bit output carry
• Subtraction:
(n bit, n bit) (n bit) difference
• Multiplication:
(n bit, n bit) (2n bit) product
• Division:
(2n bit, n bit) (n bit) quotient, (n bit) remainder
Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University
31
32. ARITHMETIC PIPELINE DESIGN
Computer Arithmetic Operations
• Floating-Point Numbers
o X = (m, e) representation
• m: mantissa or fraction
• e: exponent with an implied base or radix r.
• Actual Value X = m * r e
o Operations on numbers X = (mx, ex) and Y = (my, ey)
• Addition:
(mx * rex-ey + my, ey)
• Subtraction:
(mx * rex-ey – my, ey)
• Multiplication:
(mx * my, ex+ey)
• Division:
(mx / my, ex – ey)
• Elementary Functions
o Transcendental functions like: Trigonometric, Exponential, Logarithmic, etc.
Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University
32
33. ARITHMETIC PIPELINE DESIGN
Static Arithmetic Pipelines
•
•
•
•
Separate units for fixed point operations and floating point operations
Scalar and Vector Arithmetic Pipelines
Uni-functional or Static Pipelines
Arithmetic Pipeline Stages
o Majorly involve hardware to perform: Add and Shift micro-operations
o Addition using: Carry Propagation Adder (CPA) and Carry Save Adder (CSA)
o Shift using: Shift Registers
• Multiplication Pipeline Design
o E.g. To multiply two 8-bit numbers that yield a 16-bit product using CSA and CPA Wallace Tree.
Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University
33
34. ARITHMETIC PIPELINE DESIGN
Static Arithmetic Pipelines
0
Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University
34
39. SUPERSCALAR PIPELINE DESIGN
• Pipeline Design Parameters
o Pipeline cycle, Base cycle, Instruction issue rate, Instruction issue Latency, Simple Operation Latency
o ILP to fully utilize the pipeline
•
•
•
•
Superscalar Pipeline Structure
Data and Resource Dependencies
Pipeline Stalling
Superscalar Pipeline Scheduling
o In-order Issue and in-order completion
o In-order Issue and out-of-order completion
o Out-of-order Issue and out-of-order completion
• Superscalar Performance
Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University
39
40. SUPERSCALAR PIPELINE DESIGN
Parameter
Base Scalar Processor
Super Scalar Processor
(degree = K)
Pipeline Cycle
1 (base cycle)
K
Instruction Issue Rate
1
K
Instruction Issue Latency
1
1
Simple Operation Latency
1
1
ILP to fully utilize pipeline
1
K
Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University
40
49. SUPERSCALAR PIPELINE DESIGN
• Time required by base scalar machine:
o T(1,1) = K + N – 1
• The ideal execution time required by m-issue superscalar machine:
o T(m,1) = K + (N – m)/m
o Where,
• K is the time required to execute first m instructions through m pipelines of k-stages
simultaneously
• Second term corresponds to time required to execute remaining N-m instructions , m per
cycle through m pipelines
• The ideal speedup of superscalar machine
o S(m,1) = T(1,1)/T(m,1) = m(N + k – 1)/[N+ m(k – 1)]
• As n infinity, S(m,1) m
Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University
49