This document discusses parallel processing and various techniques used to achieve it, including pipelining and vector processing. It describes different classifications of parallel computers based on the number of instruction and data streams. It also explains single instruction single data (SISD), multiple instruction single data (MISD), single instruction multiple data (SIMD), and multiple instruction multiple data (MIMD) computer architectures. The document further discusses pipelining techniques used to improve performance in SISD systems, and provides details about arithmetic, instruction, and RISC pipelines. It also covers vector processing techniques used in SIMD systems like array processors and systolic arrays.
The document discusses parallel processing techniques in computer systems including pipelining and vector processing. It describes different types of parallel architectures like SISD, SIMD, MISD, and MIMD systems. Specific examples of parallel techniques discussed include arithmetic pipelining, instruction pipelining, vector processors, and array processors. The key benefits of these techniques are exploiting parallelism at different levels to improve computational speed and overcome limitations of conventional von Neumann architectures.
This document discusses parallel processing techniques in computer systems, including pipelining and vector processing. It provides information on parallel processing levels and Flynn's classification of computer architectures. Pipelining is described as a technique to decompose sequential processes into overlapping suboperations to improve computational speed. Vector processing involves performing the same operation on multiple data elements simultaneously. The document outlines various pipeline designs and hazards that can occur, such as structural hazards from resource conflicts and data hazards from data dependencies.
CS304PC:Computer Organization and Architecture Session 33 demo 1 ppt.pdfAsst.prof M.Gokilavani
The document discusses parallel processing techniques in computer systems including pipelining and vector processing. It covers parallel processing classifications including SISD, SIMD, MISD and MIMD architectures. It also describes pipelining techniques used to improve processor performance through parallel execution of instruction phases including arithmetic and instruction pipelining. Vector processing allows parallel execution of same operations on multiple data elements.
This document discusses parallel processing techniques in computer systems including pipelining and vector processing. It covers parallel computer architectures such as SISD, SIMD, MISD, and MIMD systems. The document also describes pipelining techniques including arithmetic pipelining and instruction pipelining. It discusses hazards that can occur in pipelined systems such as structural hazards, data hazards, and control hazards as well as methods to address these hazards including forwarding, interlocking, and instruction scheduling.
The document discusses parallel processing and pipelining techniques in computer organization. It covers topics like parallel processing concepts and classifications, pipelining concepts and how it increases computational speed, arithmetic and instruction pipelining, handling pipeline hazards like data dependencies and branches. The key advantages of pipelining include decomposing tasks into sequential sub-operations that can complete concurrently, improving throughput and achieving speedup close to the number of pipeline stages when the number of tasks is large.
This document discusses parallel processing and pipelining. It describes different levels and types of parallel processing including job level, task level, inter-instruction level, and intra-instruction level parallelism. It also covers Flynn's classification of parallel computers as SISD, SIMD, MISD, and MIMD based on the number of instruction and data streams. Pipelining is defined as decomposing a process into sub-operations that execute concurrently. The key benefits of pipelining are that multiple computations can progress simultaneously through different pipeline stages.
Pipelining is a technique used in computer processors to overlap the execution of instructions to enhance performance. It works by dividing instruction execution into discrete stages, such as fetch, decode, execute, memory, and write-back, so that multiple instructions can be in different stages at the same time. In a pipelined processor, the average time to complete an instruction is reduced compared to a non-pipelined processor, leading to higher throughput. However, special techniques are needed to handle data and structural hazards that can occur when instructions interact in unexpected ways within the pipeline.
The document discusses parallel processing techniques in computer systems including pipelining and vector processing. It describes different types of parallel architectures like SISD, SIMD, MISD, and MIMD systems. Specific examples of parallel techniques discussed include arithmetic pipelining, instruction pipelining, vector processors, and array processors. The key benefits of these techniques are exploiting parallelism at different levels to improve computational speed and overcome limitations of conventional von Neumann architectures.
This document discusses parallel processing techniques in computer systems, including pipelining and vector processing. It provides information on parallel processing levels and Flynn's classification of computer architectures. Pipelining is described as a technique to decompose sequential processes into overlapping suboperations to improve computational speed. Vector processing involves performing the same operation on multiple data elements simultaneously. The document outlines various pipeline designs and hazards that can occur, such as structural hazards from resource conflicts and data hazards from data dependencies.
CS304PC:Computer Organization and Architecture Session 33 demo 1 ppt.pdfAsst.prof M.Gokilavani
The document discusses parallel processing techniques in computer systems including pipelining and vector processing. It covers parallel processing classifications including SISD, SIMD, MISD and MIMD architectures. It also describes pipelining techniques used to improve processor performance through parallel execution of instruction phases including arithmetic and instruction pipelining. Vector processing allows parallel execution of same operations on multiple data elements.
This document discusses parallel processing techniques in computer systems including pipelining and vector processing. It covers parallel computer architectures such as SISD, SIMD, MISD, and MIMD systems. The document also describes pipelining techniques including arithmetic pipelining and instruction pipelining. It discusses hazards that can occur in pipelined systems such as structural hazards, data hazards, and control hazards as well as methods to address these hazards including forwarding, interlocking, and instruction scheduling.
The document discusses parallel processing and pipelining techniques in computer organization. It covers topics like parallel processing concepts and classifications, pipelining concepts and how it increases computational speed, arithmetic and instruction pipelining, handling pipeline hazards like data dependencies and branches. The key advantages of pipelining include decomposing tasks into sequential sub-operations that can complete concurrently, improving throughput and achieving speedup close to the number of pipeline stages when the number of tasks is large.
This document discusses parallel processing and pipelining. It describes different levels and types of parallel processing including job level, task level, inter-instruction level, and intra-instruction level parallelism. It also covers Flynn's classification of parallel computers as SISD, SIMD, MISD, and MIMD based on the number of instruction and data streams. Pipelining is defined as decomposing a process into sub-operations that execute concurrently. The key benefits of pipelining are that multiple computations can progress simultaneously through different pipeline stages.
Pipelining is a technique used in computer processors to overlap the execution of instructions to enhance performance. It works by dividing instruction execution into discrete stages, such as fetch, decode, execute, memory, and write-back, so that multiple instructions can be in different stages at the same time. In a pipelined processor, the average time to complete an instruction is reduced compared to a non-pipelined processor, leading to higher throughput. However, special techniques are needed to handle data and structural hazards that can occur when instructions interact in unexpected ways within the pipeline.
The document discusses parallel processing and provides classifications of parallel computer architectures. It describes Flynn's classification of computer architectures as single instruction stream single data stream (SISD), single instruction stream multiple data stream (SIMD), multiple instruction stream single data stream (MISD), and multiple instruction stream multiple data stream (MIMD). It also discusses pipeline computers, array processors, and multiprocessor systems as different architectural configurations for parallel computers. Pipelining is described as a technique to decompose a process into sub-operations that execute concurrently in dedicated segments to achieve overlapping computation.
Pipelining of Processors Computer ArchitectureHaris456
Pipelining is a technique used in microprocessors to overlap the execution of multiple instructions to increase throughput. It works by dividing the instruction execution process into discrete stages, such as fetch, decode, execute, memory, and write-back. When an instruction enters one stage, the previous instruction can enter the next stage, allowing the processor to complete more than one instruction per clock cycle. Pipelining reduces the time needed to complete a series of instructions by allowing the stages to process separate instructions simultaneously rather than sequentially.
This document discusses parallel processing and pipelining techniques used to improve computer performance. It covers parallel processing classifications including SISD, SIMD, MISD, and MIMD models. Pipelining is defined as decomposing tasks into sequential suboperations that execute concurrently. Arithmetic and instruction pipelines are described as having multiple stages to overlap processing of different instructions. Vector processing and array processors are mentioned as techniques to perform simultaneous operations on multiple data items.
This document discusses datapath design and arithmetic operations in computer architecture. It covers:
1) The design of circuits to implement basic fixed-point arithmetic instructions like addition, subtraction, multiplication, and division. Multiplication can be done with combinational or sequential circuits using an array of adders, while division is typically sequential using repeated subtraction.
2) The Arithmetic Logic Unit (ALU) is used to process arithmetic and logical instructions and employs a chain of identical 1-bit adders. Coprocessors can provide fast hardware implementations for complex arithmetic functions.
3) Pipeline processing is used to improve processor throughput by dividing arithmetic operations into stages to allow overlapped processing, at the cost of requiring more hardware resources
Here are the answers to the questions:
1. Pipeline cycle time = Maximum delay of any stage + Latch delay
= 90 ns + 10 ns = 100 ns
2. Non-pipeline execution time for one task = Total delay of all stages
= 60 + 50 + 90 + 80 = 280 ns
3. Speed up ratio = Non-pipeline time/Pipeline time
= 280/100 = 2.8
4. Pipeline time for 1000 tasks = Pipeline cycle time x Number of tasks
= 100 ns x 1000 = 100,000 ns = 100 μs
5. Sequential time for 1000 tasks = Non-pipeline time per task x Number of tasks
= 280 ns x 1000 = 280,
The document provides an overview of pipelining in computer processors. It discusses how pipelining works by dividing processor operations like fetch, decode, execute, memory, and write-back into discrete stages that can overlap, improving throughput. Key points made include:
- Pipelining allows multiple instructions to be in different stages of completion at the same time, improving instruction throughput.
- The document uses an example of a sequential laundry process versus a pipelined laundry process to illustrate how pipelining improves efficiency.
- It describes the five main stages of a RISC instruction set pipeline - fetch, decode, execute, memory, and write-back. The work done and data passed between each stage
Design pipeline architecture for various stage pipelinesMahmudul Hasan
This document discusses the concepts of single-cycle control, multi-cycle control, and pipelining in processors. It explains that single-cycle control has a low CPI but a long clock period, while multi-cycle control has a short clock period but high CPI. Pipelining allows overlapping the execution of instructions to improve throughput. The document presents diagrams of 5-stage instruction pipelines and describes the fetch, decode, execute, memory, and write-back stages. It also discusses pipeline hazards and performance improvements from pipelining over single-cycle and multi-cycle designs.
The CPU is the central processing unit of a computer and consists of three main parts - the control unit, register set, and ALU. The control unit directs operations between the register set and ALU. The register set stores intermediate data and the ALU performs arithmetic and logic operations. The CPU follows a fetch-execute cycle where it fetches instructions from memory and stores them in the instruction register before executing them. Common instruction types include processor-memory operations, I/O operations, data processing, and control operations.
This document discusses parallel processing and different types of parallel computers. It describes Flynn's classification of parallel computers based on the number of instruction and data streams as SISD, SIMD, MISD, and MIMD. It then provides details about each classification including characteristics, examples, and limitations. The document also covers topics like pipelining, interconnection networks, and how pipelining can improve the speed of computation.
This document summarizes several course projects completed by Setiawan Soekamtoputra for their Master's degree. The projects include:
1) Design of a 32-bit pipelined CPU in Verilog including implementation of an ASIC flow, multiplier with accumulator case study, and pipeline optimization case study.
2) Development of a monitor program for the MC68000 processor in assembly language including common memory and register commands and exception handlers.
3) Implementation of a high-performance pipelined MIPS processor in VHDL including hazard detection and data forwarding units to handle data and branch hazards.
4) Network on chip prototype designs including a 3-node partially connected mesh design in SystemC and
This document discusses the implementation of a basic MIPS processor including building the datapath, control implementation, pipelining, and handling hazards. It describes the MIPS instruction set and 5-stage pipeline. The datapath is built from components like registers, ALUs, and adders. Control signals are designed for different instructions. Pipelining is implemented using techniques like forwarding and branch prediction to handle data and control hazards between stages. Exceptions are handled using status registers or vectored interrupts.
The document discusses pipelining in computer processors. It describes how pipelining can increase throughput by overlapping the execution of multiple instructions. It discusses the basic pipeline stages for a RISC instruction set, including fetch, decode, execute, memory access, and writeback. It also describes several types of pipeline hazards that can occur, such as structural hazards caused by resource conflicts, data hazards when instructions depend on previous results, and control hazards with branches. Forwarding techniques are presented to help address data hazards.
VampirTrace provides instrumentation and run-time measurement capabilities. It allows for automatic, manual, and binary instrumentation. Run-time measurement includes collecting trace data behind the scenes and post-processing. Users have options to configure various settings like environment variables, hardware performance counters, memory allocation counters, filtering, and grouping. FAQ and troubleshooting information is also available.
Microchip's PIC Micro Controller - Presentation Covers- Embedded system,Application, Harvard and Von Newman Architecture, PIC Microcontroller Instruction Set, PIC assembly language programming, PIC Basic circuit design and its programming etc.
The document summarizes the RISC pipeline architecture. It discusses the five stages of the classic RISC pipeline: instruction fetch, instruction decode, execute, memory access, and writeback. Each stage is involved in processing one instruction at a time through the pipeline. The instruction fetch stage retrieves instructions from the instruction cache. The decode stage decodes the instruction and computes branch targets. The execute stage performs arithmetic and logical operations. The memory access stage handles data memory access. Finally, the writeback stage writes results back to registers. The document also discusses hazards like structural, data, and control hazards that can occur in pipelines.
Parallel processing involves performing multiple tasks simultaneously to increase computational speed. It can be achieved through pipelining, where instructions are overlapped in execution, or vector/array processors where the same operation is performed on multiple data elements at once. The main types are SIMD (single instruction multiple data) and MIMD (multiple instruction multiple data). Pipelining provides higher throughput by keeping the pipeline full but requires handling dependencies between instructions to avoid hazards slowing things down.
This document discusses instruction pipelining and main memory. It begins by explaining how an instruction pipeline works, overlapping the fetch, decode, and execute phases of instruction processing. It notes some difficulties in pipelining including resource conflicts, data dependencies, and branch instructions. It then discusses pipeline control and performance, noting that pipelining provides faster processing by decomposing tasks into sequential sub-operations that can overlap. It concludes by answering questions about pipelining hazards and calculating pipeline metrics for example processors.
The document discusses pipeline computing and its various types and applications. It defines pipeline computing as a technique to decompose a sequential process into parallel sub-processes that can execute concurrently. There are two main types - linear and non-linear pipelines. Linear pipelines use a single reservation table while non-linear pipelines use multiple tables. Common applications of pipeline computing include instruction pipelines in CPUs, graphics pipelines in GPUs, software pipelines using pipes, and HTTP pipelining. The document also discusses implementations of pipeline computing and its advantages like reduced cycle time and increased instruction throughput.
Various processor architectures are described in this presentation. It could be useful for people working for h/w selection and processor identification.
This document discusses general-purpose processors. It begins by introducing general-purpose processors and their basic architecture, which consists of a control unit and datapath that is designed to perform a variety of computation tasks. It then describes the operations of loading, storing, and arithmetic/logical operations that can be performed by the datapath. Subsequent sections provide more details on the control unit and how it sequences operations, instruction cycles, architectural considerations like bit-width and clock frequency, and techniques for improving performance like pipelining and superscalar execution. The document concludes with sections on assembly-level instructions and programmer considerations.
This document discusses schema refinement through normalization. Schema refinement aims to eliminate data redundancy and anomalies like insertion, update, and deletion anomalies. It introduces normalization as a technique to decompose tables and refine the schema. Redundancy can lead to problems like redundant storage, update anomalies if one copy of data is changed without updating others, and insertion and deletion anomalies where adding or removing data could impact unrelated information. The document uses an example of a student details table to illustrate these problems and how decomposition can address redundancy.
joins in dbms its describes about how joins are important and necessity in d...AshokRachapalli1
Joins in DBMS allow combining data from multiple tables. Inner joins return rows where the join condition is satisfied, while outer joins also return rows with no matches and fill unmatched columns with NULL. Natural joins automatically join on common columns with matching names and domains, while theta joins use any comparison operator in the join condition. Equi joins specifically use equality comparisons.
The document discusses parallel processing and provides classifications of parallel computer architectures. It describes Flynn's classification of computer architectures as single instruction stream single data stream (SISD), single instruction stream multiple data stream (SIMD), multiple instruction stream single data stream (MISD), and multiple instruction stream multiple data stream (MIMD). It also discusses pipeline computers, array processors, and multiprocessor systems as different architectural configurations for parallel computers. Pipelining is described as a technique to decompose a process into sub-operations that execute concurrently in dedicated segments to achieve overlapping computation.
Pipelining of Processors Computer ArchitectureHaris456
Pipelining is a technique used in microprocessors to overlap the execution of multiple instructions to increase throughput. It works by dividing the instruction execution process into discrete stages, such as fetch, decode, execute, memory, and write-back. When an instruction enters one stage, the previous instruction can enter the next stage, allowing the processor to complete more than one instruction per clock cycle. Pipelining reduces the time needed to complete a series of instructions by allowing the stages to process separate instructions simultaneously rather than sequentially.
This document discusses parallel processing and pipelining techniques used to improve computer performance. It covers parallel processing classifications including SISD, SIMD, MISD, and MIMD models. Pipelining is defined as decomposing tasks into sequential suboperations that execute concurrently. Arithmetic and instruction pipelines are described as having multiple stages to overlap processing of different instructions. Vector processing and array processors are mentioned as techniques to perform simultaneous operations on multiple data items.
This document discusses datapath design and arithmetic operations in computer architecture. It covers:
1) The design of circuits to implement basic fixed-point arithmetic instructions like addition, subtraction, multiplication, and division. Multiplication can be done with combinational or sequential circuits using an array of adders, while division is typically sequential using repeated subtraction.
2) The Arithmetic Logic Unit (ALU) is used to process arithmetic and logical instructions and employs a chain of identical 1-bit adders. Coprocessors can provide fast hardware implementations for complex arithmetic functions.
3) Pipeline processing is used to improve processor throughput by dividing arithmetic operations into stages to allow overlapped processing, at the cost of requiring more hardware resources
Here are the answers to the questions:
1. Pipeline cycle time = Maximum delay of any stage + Latch delay
= 90 ns + 10 ns = 100 ns
2. Non-pipeline execution time for one task = Total delay of all stages
= 60 + 50 + 90 + 80 = 280 ns
3. Speed up ratio = Non-pipeline time/Pipeline time
= 280/100 = 2.8
4. Pipeline time for 1000 tasks = Pipeline cycle time x Number of tasks
= 100 ns x 1000 = 100,000 ns = 100 μs
5. Sequential time for 1000 tasks = Non-pipeline time per task x Number of tasks
= 280 ns x 1000 = 280,
The document provides an overview of pipelining in computer processors. It discusses how pipelining works by dividing processor operations like fetch, decode, execute, memory, and write-back into discrete stages that can overlap, improving throughput. Key points made include:
- Pipelining allows multiple instructions to be in different stages of completion at the same time, improving instruction throughput.
- The document uses an example of a sequential laundry process versus a pipelined laundry process to illustrate how pipelining improves efficiency.
- It describes the five main stages of a RISC instruction set pipeline - fetch, decode, execute, memory, and write-back. The work done and data passed between each stage
Design pipeline architecture for various stage pipelinesMahmudul Hasan
This document discusses the concepts of single-cycle control, multi-cycle control, and pipelining in processors. It explains that single-cycle control has a low CPI but a long clock period, while multi-cycle control has a short clock period but high CPI. Pipelining allows overlapping the execution of instructions to improve throughput. The document presents diagrams of 5-stage instruction pipelines and describes the fetch, decode, execute, memory, and write-back stages. It also discusses pipeline hazards and performance improvements from pipelining over single-cycle and multi-cycle designs.
The CPU is the central processing unit of a computer and consists of three main parts - the control unit, register set, and ALU. The control unit directs operations between the register set and ALU. The register set stores intermediate data and the ALU performs arithmetic and logic operations. The CPU follows a fetch-execute cycle where it fetches instructions from memory and stores them in the instruction register before executing them. Common instruction types include processor-memory operations, I/O operations, data processing, and control operations.
This document discusses parallel processing and different types of parallel computers. It describes Flynn's classification of parallel computers based on the number of instruction and data streams as SISD, SIMD, MISD, and MIMD. It then provides details about each classification including characteristics, examples, and limitations. The document also covers topics like pipelining, interconnection networks, and how pipelining can improve the speed of computation.
This document summarizes several course projects completed by Setiawan Soekamtoputra for their Master's degree. The projects include:
1) Design of a 32-bit pipelined CPU in Verilog including implementation of an ASIC flow, multiplier with accumulator case study, and pipeline optimization case study.
2) Development of a monitor program for the MC68000 processor in assembly language including common memory and register commands and exception handlers.
3) Implementation of a high-performance pipelined MIPS processor in VHDL including hazard detection and data forwarding units to handle data and branch hazards.
4) Network on chip prototype designs including a 3-node partially connected mesh design in SystemC and
This document discusses the implementation of a basic MIPS processor including building the datapath, control implementation, pipelining, and handling hazards. It describes the MIPS instruction set and 5-stage pipeline. The datapath is built from components like registers, ALUs, and adders. Control signals are designed for different instructions. Pipelining is implemented using techniques like forwarding and branch prediction to handle data and control hazards between stages. Exceptions are handled using status registers or vectored interrupts.
The document discusses pipelining in computer processors. It describes how pipelining can increase throughput by overlapping the execution of multiple instructions. It discusses the basic pipeline stages for a RISC instruction set, including fetch, decode, execute, memory access, and writeback. It also describes several types of pipeline hazards that can occur, such as structural hazards caused by resource conflicts, data hazards when instructions depend on previous results, and control hazards with branches. Forwarding techniques are presented to help address data hazards.
VampirTrace provides instrumentation and run-time measurement capabilities. It allows for automatic, manual, and binary instrumentation. Run-time measurement includes collecting trace data behind the scenes and post-processing. Users have options to configure various settings like environment variables, hardware performance counters, memory allocation counters, filtering, and grouping. FAQ and troubleshooting information is also available.
Microchip's PIC Micro Controller - Presentation Covers- Embedded system,Application, Harvard and Von Newman Architecture, PIC Microcontroller Instruction Set, PIC assembly language programming, PIC Basic circuit design and its programming etc.
The document summarizes the RISC pipeline architecture. It discusses the five stages of the classic RISC pipeline: instruction fetch, instruction decode, execute, memory access, and writeback. Each stage is involved in processing one instruction at a time through the pipeline. The instruction fetch stage retrieves instructions from the instruction cache. The decode stage decodes the instruction and computes branch targets. The execute stage performs arithmetic and logical operations. The memory access stage handles data memory access. Finally, the writeback stage writes results back to registers. The document also discusses hazards like structural, data, and control hazards that can occur in pipelines.
Parallel processing involves performing multiple tasks simultaneously to increase computational speed. It can be achieved through pipelining, where instructions are overlapped in execution, or vector/array processors where the same operation is performed on multiple data elements at once. The main types are SIMD (single instruction multiple data) and MIMD (multiple instruction multiple data). Pipelining provides higher throughput by keeping the pipeline full but requires handling dependencies between instructions to avoid hazards slowing things down.
This document discusses instruction pipelining and main memory. It begins by explaining how an instruction pipeline works, overlapping the fetch, decode, and execute phases of instruction processing. It notes some difficulties in pipelining including resource conflicts, data dependencies, and branch instructions. It then discusses pipeline control and performance, noting that pipelining provides faster processing by decomposing tasks into sequential sub-operations that can overlap. It concludes by answering questions about pipelining hazards and calculating pipeline metrics for example processors.
The document discusses pipeline computing and its various types and applications. It defines pipeline computing as a technique to decompose a sequential process into parallel sub-processes that can execute concurrently. There are two main types - linear and non-linear pipelines. Linear pipelines use a single reservation table while non-linear pipelines use multiple tables. Common applications of pipeline computing include instruction pipelines in CPUs, graphics pipelines in GPUs, software pipelines using pipes, and HTTP pipelining. The document also discusses implementations of pipeline computing and its advantages like reduced cycle time and increased instruction throughput.
Various processor architectures are described in this presentation. It could be useful for people working for h/w selection and processor identification.
This document discusses general-purpose processors. It begins by introducing general-purpose processors and their basic architecture, which consists of a control unit and datapath that is designed to perform a variety of computation tasks. It then describes the operations of loading, storing, and arithmetic/logical operations that can be performed by the datapath. Subsequent sections provide more details on the control unit and how it sequences operations, instruction cycles, architectural considerations like bit-width and clock frequency, and techniques for improving performance like pipelining and superscalar execution. The document concludes with sections on assembly-level instructions and programmer considerations.
This document discusses schema refinement through normalization. Schema refinement aims to eliminate data redundancy and anomalies like insertion, update, and deletion anomalies. It introduces normalization as a technique to decompose tables and refine the schema. Redundancy can lead to problems like redundant storage, update anomalies if one copy of data is changed without updating others, and insertion and deletion anomalies where adding or removing data could impact unrelated information. The document uses an example of a student details table to illustrate these problems and how decomposition can address redundancy.
joins in dbms its describes about how joins are important and necessity in d...AshokRachapalli1
Joins in DBMS allow combining data from multiple tables. Inner joins return rows where the join condition is satisfied, while outer joins also return rows with no matches and fill unmatched columns with NULL. Natural joins automatically join on common columns with matching names and domains, while theta joins use any comparison operator in the join condition. Equi joins specifically use equality comparisons.
Database languages are used to define, manipulate, and control access to data in a database management system. There are four main types of database languages: Data Definition Language (DDL) defines the database structure; Data Manipulation Language (DML) reads, inserts, updates, and deletes data; Data Control Language (DCL) controls user access privileges; and Transaction Control Language (TCL) manages transactions and rolling back or committing changes to the database.
The document discusses register transfer languages (RTL) which are used to specify the operations and timing of digital circuits. It covers micro-operations which define data transfers, RTL which specifies when micro-operations occur, and how RTL specifications can be realized through hardware implementation or simulated using VHDL. Examples are provided of RTL specifications for simple counters and controllers to illustrate these concepts.
The document discusses different levels of computer memory and cache memory. It describes four levels of memory:
1) Register - Stores data accepted by the CPU.
2) Cache memory - Faster memory that temporarily stores frequently accessed data from main memory.
3) Main memory - The memory the computer currently works on but data is lost when powered off.
4) Secondary memory - External memory that stores data permanently but is slower than main memory.
It then discusses cache memory in more detail, describing it as very high-speed memory that stores copies of frequently used data from main memory to reduce average access time. It explains the concepts of cache hits, misses, and hit ratio. Finally, it
The document discusses different types of addressing modes used in computer instructions, including implied, immediate, direct, indirect, register direct, register indirect, relative, indexed, base register, auto-increment, and auto-decrement addressing modes. It provides examples and explanations of each addressing mode type.
The document discusses input/output (I/O) organization in a computer system. It describes I/O interfaces that allow communication between internal storage and external devices. Data transfer can occur via programmed I/O, interrupt-initiated I/O, or direct memory access (DMA). DMA allows direct transfer between I/O devices and memory without CPU involvement by using a DMA controller. An I/O processor (IOP) is also described, which is a dedicated processor that handles I/O operations and transfers data between devices and memory.
Virtual memory allows programs to access memory addresses that do not physically exist, expanding the available address space. It works by dividing memory into pages that are stored on disk until needed, then copied into RAM. When a program accesses a non-present page, a page fault occurs and the operating system handles copying the correct page into memory transparently to the program. This allows more programs to run than would otherwise fit in physical memory.
This document discusses techniques for reducing cache misses and improving memory performance. It introduces the concepts of compulsory, capacity and conflict misses. Methods covered for reducing misses include increasing block size, associativity, using victim caches, pseudo-associativity, hardware/software prefetching, and compiler optimizations like merging arrays, loop interchange, fusion and blocking. Both hardware and software prefetching are described as well as the tradeoffs between binding and non-binding prefetching.
Disk-based storage uses a memory hierarchy to balance performance and cost. Large, slower disks are used for persistent storage due to their low cost per byte, while smaller, faster memory like DRAM is used for temporary storage. A disk contains platters that spin, allowing read/write heads to access sectors organized into tracks on the platters. Disk access time is dominated by seek time to position the heads and rotational latency waiting for the desired sector to spin under the head. Disks present a logical block interface to the operating system, while sectors are mapped to physical locations on disk surfaces.
Digital systems perform elementary operations called micro operations on information stored in registers. There are two main types: arithmetic micro operations that change information, such as addition, subtraction, and shift operations; and logic micro operations that perform binary operations on bit strings, like AND, OR, and XOR. Common components that perform these micro operations include binary adders, adder-subtractors, incrementers, and the Arithmetic Logic Shift Unit.
The document discusses computer instruction formats and addressing modes. It provides details on:
- Instruction codes contain operation codes and addresses to specify operations and memory/register locations.
- There are two addressing modes - direct addressing uses the operand's address while indirect uses a pointer.
- A basic instruction format has 12 bits for the address, 1 bit for the mode, and 3 bits for the operation code.
- An instruction cycle has four phases - fetch, decode, read effective address, and execute the instruction.
There are two main types of computer network architectures: peer-to-peer and client/server. Peer-to-peer networks connect computers of equal status without a central server, making them useful for small networks but less secure. Client/server networks have a central server that manages resources and authorization for client computers, providing better security, performance, and backup but at a higher cost than peer-to-peer.
A computer network can be categorized based on its size as PAN, LAN, MAN, or WAN. A PAN covers an area of about 30 feet and connects personal devices like laptops and phones. A LAN connects computers within a building using cables, providing faster data transfer and higher security than larger networks. A MAN interconnects multiple LANs within a city using telephone lines to connect organizations like businesses, schools, and governments. A WAN spans large geographic areas like countries and states, with the internet being the largest example, connecting networks globally.
Data encoding converts data into a signal form for transmission. It represents digital data with digital or analog signals. Common encoding methods include unipolar, bipolar, and polar encoding. Unipolar encoding uses a single voltage level to represent 1s and 0s, while bipolar uses two voltage levels. Specific techniques include NRZ, RZ, and biphase encoding. NRZ encodes without returning to zero between bits, while RZ returns to zero mid-bit. Biphase encodings like Manchester and differential Manchester use signal transitions to represent data and synchronize clocks. Block coding maps groups of bits to code words, like 4B/5B encoding which maps 4 data bits to 5-bit code words.
Flow control is a data link layer mechanism that regulates the amount of data sent by the sender to ensure the receiver can process it. It works by having the sender wait for acknowledgment from the receiver before sending more data. Common flow control methods include stop-and-wait, which only allows one packet to be sent at a time, and sliding window protocols, which allow multiple packets to be sent before waiting for acknowledgment. Flow control prevents buffer overflows and frame losses at the receiver.
This document summarizes a lecture on register transfer language and microoperations. It introduces register transfer language as a way to describe the transfer of data between registers using microoperations. Common microoperations include register transfer, arithmetic operations, logic operations, and shift operations. Specific circuit implementations for operations like addition, subtraction, and incrementing are discussed. Memory transfer microoperations for reading from and writing to memory are also covered.
This document provides an introduction and overview of the Python programming language. It outlines the key topics that will be covered in a Python tutorial, including basic data types, variables, control structures, functions, classes, exceptions, modules and packages, and the standard library. The document consists of slides from a 2002 presentation on Python given by Guido van Rossum, the creator of Python. It encourages attendees to follow along with the tutorial using the interactive Python shell.
This document provides an overview of the OSI reference model, which is an internationally standardized architecture for how network communication should work. It describes the seven layers of the OSI model from the physical layer up to the application layer. Each layer provides services to the layer above it and receives services from the layer below. The layers relate to either communication technologies (layers 1-4) or user applications (layers 5-7). The document also discusses how the OSI model differs from Internet protocols and covers concepts like connection types, reliability, and the relationship between services and protocols.
Packet switching is a technique used in computer networks where messages are divided into packets that contain header information with the destination. Each packet is routed independently through the network based on its header. There are two main approaches for packet switching: datagram packet switching treats each packet independently and routes them without maintaining connection state, while virtual circuit switching establishes a pre-planned route via a call setup before sending packets along a fixed path for the connection's duration.
Introduction to AI for Nonprofits with Tapp NetworkTechSoup
Dive into the world of AI! Experts Jon Hill and Tareq Monaur will guide you through AI's role in enhancing nonprofit websites and basic marketing strategies, making it easy to understand and apply.
Macroeconomics- Movie Location
This will be used as part of your Personal Professional Portfolio once graded.
Objective:
Prepare a presentation or a paper using research, basic comparative analysis, data organization and application of economic information. You will make an informed assessment of an economic climate outside of the United States to accomplish an entertainment industry objective.
How to Add Chatter in the odoo 17 ERP ModuleCeline George
In Odoo, the chatter is like a chat tool that helps you work together on records. You can leave notes and track things, making it easier to talk with your team and partners. Inside chatter, all communication history, activity, and changes will be displayed.
Physiology and chemistry of skin and pigmentation, hairs, scalp, lips and nail, Cleansing cream, Lotions, Face powders, Face packs, Lipsticks, Bath products, soaps and baby product,
Preparation and standardization of the following : Tonic, Bleaches, Dentifrices and Mouth washes & Tooth Pastes, Cosmetics for Nails.
How to Build a Module in Odoo 17 Using the Scaffold MethodCeline George
Odoo provides an option for creating a module by using a single line command. By using this command the user can make a whole structure of a module. It is very easy for a beginner to make a module. There is no need to make each file manually. This slide will show how to create a module using the scaffold method.
This presentation was provided by Steph Pollock of The American Psychological Association’s Journals Program, and Damita Snow, of The American Society of Civil Engineers (ASCE), for the initial session of NISO's 2024 Training Series "DEIA in the Scholarly Landscape." Session One: 'Setting Expectations: a DEIA Primer,' was held June 6, 2024.
Main Java[All of the Base Concepts}.docxadhitya5119
This is part 1 of my Java Learning Journey. This Contains Custom methods, classes, constructors, packages, multithreading , try- catch block, finally block and more.
हिंदी वर्णमाला पीपीटी, hindi alphabet PPT presentation, hindi varnamala PPT, Hindi Varnamala pdf, हिंदी स्वर, हिंदी व्यंजन, sikhiye hindi varnmala, dr. mulla adam ali, hindi language and literature, hindi alphabet with drawing, hindi alphabet pdf, hindi varnamala for childrens, hindi language, hindi varnamala practice for kids, https://www.drmullaadamali.com
2. Pipelining and Vector Processing 2
PARALLEL PROCESSING
Computer Organization Computer Architectures Lab
Levels of Parallel Processing
- Job or Program level
- Task or Procedure level
- Inter-Instruction level
- Intra-Instruction level
Execution of Concurrent Events in the computing
process to achieve faster Computational Speed
Parallel Processing
3. Pipelining and Vector Processing 3
PARALLEL COMPUTERS
Computer Organization Computer Architectures Lab
Number of Data Streams
Single Multiple
Number of
Instruction
Streams
Single SISD SIMD
Multiple MISD MIMD
Parallel Processing
Architectural Classification
– Flynn's classification
» Based on the multiplicity of Instruction Streams and
Data Streams
» Instruction Stream
• Sequence of Instructions read from memory
» Data Stream
• Operations performed on the data in the processor
4. COMPUTER ARCHITECTURES FOR PARALLEL
PROCESSING
Von-Neuman
based
Dataflow
Reduction
SISD
MISD
SIMD
MIMD
Superscalar processors
Superpipelined processors
VLIW
Nonexistence
Array processors
Systolic arrays
Associative processors
Shared-memory multiprocessors
Bus based
Crossbar switch based
Multistage IN based
Message-passing multicomputers
Hypercube
Mesh
Reconfigurable
Pipelining and Vector Processing 4 Parallel Processing
Computer Organization Computer Architectures Lab
5. Pipelining and Vector Processing 5
SISD COMPUTER SYSTEMS
Control
Unit
Processor
Unit
Memory
Data stream
Instruction stream
Characteristics
- Standard von Neumann machine
- Instructions and data are stored in memory
- One operation at a time
Limitations
Computer Organization Computer Architectures Lab
Von Neumann bottleneck
Maximum speed of the system is limited by the
Memory Bandwidth (bits/sec or bytes/sec)
- Limitation on Memory Bandwidth
- Memory is shared by CPU and I/O
Parallel Processing
7. Pipelining and Vector Processing 7
MISD COMPUTER SYSTEMS
M CU P
M CU P
P
•
•
•
•
•
•
Memory
M CU
Instruction stream
Computer Organization Computer Architectures Lab
Data stream
Characteristics
- There is no computer at present that can be
classified as MISD
Parallel Processing
8. Pipelining and Vector Processing 8
SIMD COMPUTER SYSTEMS
Control Unit
Memory
Alignment network
P P • • •
M M
M • • •
Data bus
Instruction stream
Data stream
P Processor units
Memory modules
Characteristics
- Only one copy of the program exists
- A single controller executes one instruction at atime
Parallel Processing
Computer Organization Computer Architectures Lab
9. Pipelining and Vector Processing 9
TYPES OF SIMD COMPUTERS
Computer Organization Computer Architectures Lab
Array Processors
- The control unit broadcasts instructions to all PEs,
and all active PEs execute the same instructions
- ILLIAC IV, GF-11, Connection Machine, DAP, MPP
Systolic Arrays
- Regular arrangement of a large number of
very simple processors constructed on
VLSI circuits
- CMU Warp, Purdue CHiP
Associative Processors
- Content addressing
- Data transformation operations over many
sets of arguments with a single instruction
- STARAN, PEPE
Parallel Processing
10. Pipelining and Vector Processing 10
MIMD COMPUTER SYSTEMS
Interconnection Network
P M P M
P M • • •
Shared Memory
Characteristics
- Multiple processing units
- Execution of multiple instructions on multiple data
Types of MIMD computer systems
- Shared memory multiprocessors
- Message-passing multicomputers
Computer Organization Computer Architectures Lab
Parallel Processing
11. Pipelining and Vector Processing 11 Parallel Processing
SHARED MEMORY MULTIPROCESSORS
Characteristics
All processors have equally direct access to
one large memory address space
Example systems
Bus and cache-based systems
- Sequent Balance, Encore Multimax
Multistage IN-based systems
- Ultracomputer, Butterfly, RP3, HEP
Crossbar switch-based systems
- C.mmp, Alliant FX/8
Limitations
Memory access latency
Hot spot problem
Interconnection Network(IN)
• • •
• • •
P P P
M M
M
Buses,
Multistage IN,
Crossbar Switch
Computer Organization Computer Architectures Lab
12. Pipelining and Vector Processing 12 Parallel Processing
MESSAGE-PASSING MULTICOMPUTER
Characteristics
- Interconnected computers
- Each processor has its own memory, and
communicate via message-passing
Example systems
- Tree structure: Teradata, DADO
- Mesh-connected: Rediflow, Series 2010, J-Machine
- Hypercube: Cosmic Cube, iPSC, NCUBE, FPS T Series, Mark III
Limitations
- Communication overhead
- Hard to programming
Message-Passing Network
• • •
P P
P
M M M
• • •
Point-to-point connections
Computer Organization Computer Architectures Lab
13. 13
Pipelining and Vector Processing
PIPELINING
R1 Ai, R2 Bi
R3 R1 * R2, R4 Ci
R5 R3 + R4
Load Ai andBi
Multiply and load Ci
Add
A technique of decomposing a sequential process
into suboperations, with each subprocess being
executed in a partial dedicated segment that
operates concurrently with all other segments.
Ai * Bi + Ci for i = 1, 2, 3, ... , 7
Ai
R1 R2
Multiplier
R3 R4
Adder
R5
Pipelining
Computer Organization Computer Architectures Lab
Bi Memory Ci
Segment 1
Segment 2
Segment 3
15. 15
Pipelining and Vector Processing
GENERAL PIPELINE
General Structure of a 4-Segment Pipeline
S1 R1 S2 R2 S3 R3 S4 R4
Computer Organization Computer Architectures Lab
Input
Clock
Space-Time Diagram
1
T1
2
T2
3
T3
4
T4
5
T5
6
T6
7 8 9
T1 T2 T3 T4 T5 T6
T1 T2 T3 T4 T5 T6
T1 T2 T3 T4 T5 T6
Clock cycles
Segment 1
2
3
4
Pipelining
16. 16
Pipelining and Vector Processing
PIPELINE SPEEDUP
n: Number of tasks to be performed
Conventional Machine (Non-Pipelined)
tn: Clock cycle
: Time required to complete the n tasks
= n * tn
Pipelined Machine (k stages)
tp: Clock cycle (time to complete each suboperation)
: Time required to complete the n tasks
= (k + n - 1) * tp
Speedup
Sk: Speedup
Sk = n*tn / (k + n - 1)*tp
n k
tn
tp
Computer Organization Computer Architectures Lab
( = k, if t n p
= k * t )
lim S =
Pipelining
17. Pipelining and Vector Processing 17 Pipelining
PIPELINE AND MULTIPLE FUNCTION UNITS
P 1
I i
P2
Ii + 1
P3
I i + 2
P 4
I i + 3
Computer Organization Computer Architectures Lab
Multiple Functional Units
Example
- 4-stage pipeline
- subopertion in each stage; tp =20nS
- 100 tasks to be executed
- 1 task in non-pipelined system; 20*4 = 80nS
Pipelined System
(k + n - 1)*tp = (4 + 99) * 20 = 2060nS
Non-Pipelined System
n*k*tp = 100 * 80 = 8000nS
Speedup
Sk = 8000 / 2060 = 3.88
4-Stage Pipeline is basically identical to the system
with 4 identical function units
18. 18
Pipelining and Vector Processing
ARITHMETIC PIPELINE
Floating-point adder Exponents
a b
Mantissas
A B
R
Align mantissa
R
Add or subtract
mantissas
R
Normalize
result
R
X = A x 2a
Y = B x 2b
R
1 Compare the exponents
2 Align the mantissa
3 Add/sub the mantissa
4 Normalize the result
Segment 1:
Compare
exponents
by subtraction
R
Difference
Segment 2: Choose exponent
Segment 3:
R
Segment 4: Adjust
exponent
R
Arithmetic Pipeline
Computer Organization Computer Architectures Lab
19. A = a x 2p B = b x 2q
p a q b
Exponent
subtractor
Fraction
selector
Fraction with min(p,q)
Right shifter
Other
fraction
t = |p -q|
r = max(p,q)
Fraction
adder
Leading zero
counter
r c
Left shifter
c
Exponent
adder
r
d
d
Stages:
S1
S2
S3
S4
s
C = A + B = c x 2r= d x 2s
(r = max (p,q), 0.5 d < 1)
Computer Organization Computer Architectures Lab
Pipelining and Vector Processing 19 Arithmetic Pipeline
4-STAGE FLOATING POINT ADDER
20. Pipelining and Vector Processing 20
INSTRUCTION CYCLE
Computer Organization Computer Architectures Lab
Six Phases* in an Instruction Cycle
1 Fetch an instruction from memory
2 Decode the instruction
3 Calculate the effective address of the operand
4 Fetch the operands from memory
5 Execute the operation
6 Store the result in the proper place
* Some instructions skip some phases
* Effective address calculation can be done in
the part of the decoding phase
* Storage of the operation result into a register
is done automatically in the execution phase
==> 4-Stage Pipeline
1 FI: Fetch an instruction from memory
2 DA: Decode the instruction and calculate
the effective address of the operand
3 FO: Fetch the operand
4 EX: Execute the operation
Instruction Pipeline
21. Pipelining and Vector Processing 21
INSTRUCTION PIPELINE
Computer Organization Computer Architectures Lab
Instruction Pipeline
FI DA FO EX
FI DA FO EX
FI DA FO EX
i
i+1
i+2
Execution of Three Instructions in a 4-Stage Pipeline
Conventional
Pipelined
i FI DA FO EX
i+1 FI DA FO EX
i+2 FI DA FO EX
22. Pipelining and Vector Processing 22 Instruction Pipeline
INSTRUCTION EXECUTION IN A 4-STAGE PIPELINE
Instru
(Bran
Step: 1 2 3 4 5 6 7 8 9 10 11 12 13
ction 1 FI DA FO EX
2
ch) 3
FI DA FO EX
FI DA FO EX
4 FI FI DA FO EX
5 FI DA FO EX
6 FI DA FO EX
7 FI DA FO EX
Fetch instruction
from memory
Decode instruction
and calculate
effective address
Branch?
Fetch operand
from memory
Execute instruction
Interrupt?
Interrupt
handling
Update PC
Empty pipe
no
yes
yes
no
Segment1:
Segment2:
Segment3:
Segment4:
Computer Organization Computer Architectures Lab
23. Pipelining and Vector Processing 23 Instruction Pipeline
MAJOR HAZARDS IN PIPELINED EXECUTION
Structural hazards(Resource Conflicts)
Hardware Resources required by the instructionsin
simultaneous overlapped execution cannot be met
Data hazards (Data Dependency Conflicts)
An instruction scheduled to be executed in the pipeline requires the
result of a previous instruction, which is not yet available
JMP ID PC + PC
Computer Organization Computer Architectures Lab
bubble IF ID OF OE OS
Branch address dependency
Hazards in pipelines may make it
necessary to stall the pipeline
Pipeline Interlock:
Detect Hazards Stall until it is cleared
ADD DA B,C +
INC DA bubble R1 +1
Data dependency
R1 <- B + C
R1 <- R1 + 1
Control hazards
Branches and other instructions that change the PC
make the fetch of the next instruction to be delayed
24. Pipelining and Vector Processing 24
STRUCTURAL HAZARDS
Computer Organization Computer Architectures Lab
Structural Hazards
Occur when some resource has not been
duplicated enough to allow all combinations
of instructions in the pipeline to execute
Example: With one memory-port, a data and an instruction fetch
cannot be initiated in the same clock
The Pipeline is stalled for a structural hazard
<- Two Loads with one port memory
-> Two-port memory will serve without stall
Instruction Pipeline
i
i+1
i+2
FI DA FO EX
FI DA FO EX
stall stall FI DA FO EX
25. 25
Pipelining and Vector Processing
DATA HAZARDS
Computer Organization Computer Architectures Lab
Data Hazards
Occurs when the execution of an instruction
depends on the results of a previous instruction
ADD R1, R2, R3
SUB R4, R1, R5
Data hazard can be dealt with either hardware
techniques or software technique
Hardware Technique
Interlock
- hardware detects the data dependencies and delays the scheduling
of the dependent instruction by stalling enough clock cycles
Forwarding (bypassing, short-circuiting)
- Accomplished by a data path that routes a value from a source
(usually an ALU) to a user, bypassing a designated register. This
allows the value to be produced to be used at an earlier stage in the
pipeline than would otherwise be possible
Software Technique
Instruction Scheduling(compiler) for delayed load
Instruction Pipeline
26. 26
Pipelining and Vector Processing Instruction Pipeline
Computer Organization Computer Architectures Lab
I A E
I A E
I A E
FORWARDING HARDWARE
Example:
ADD R1, R2, R3
SUB R4, R1, R5
3-stage Pipeline
I: Instruction Fetch
A: Decode, Read Registers,
ALU Operations
E: Write the result to the
destination register
ADD
SUB
SUB
Result
write bus
Without Bypassing
With Bypassing
MUX
Register
file
MUX Bypass
path
ALU
R4
ALU result buffer
27. Pipelining and Vector Processing 27
INSTRUCTION SCHEDULING
a = b + c;
d = e - f;
Unscheduled code:
Delayed Load
A load requiring that the following instruction not use itsresult
Scheduled Code:
LW Rb, b LW Rb, b
LW Rc, c LW Rc, c
ADD Ra, Rb, Rc LW Re, e
SW a, Ra ADD Ra, Rb, Rc
LW Re, e LW Rf, f
LW Rf, f SW a, Ra
SUB Rd, Re, Rf SUB Rd, Re, Rf
SW d, Rd SW d, Rd
Instruction Pipeline
Computer Organization Computer Architectures Lab
28. Pipelining and Vector Processing 28
CONTROL HAZARDS
FI DA FO EX
FI DA FO EX
Branch Instructions
- Branch target address is not known until
the branch instruction is completed
Branch
Instruction
Next
Instruction
Target address available
- Stall -> waste of cycle times
Dealing with Control Hazards
* Prefetch Target Instruction
* Branch Target Buffer
* Loop Buffer
* Branch Prediction
* Delayed Branch
Instruction Pipeline
Computer Organization Computer Architectures Lab
29. Pipelining and Vector Processing 29
CONTROL HAZARDS
Computer Organization Computer Architectures Lab
Instruction Pipeline
Prefetch Target Instruction
– Fetch instructions in both streams, branch not taken and branch taken
– Both are saved until branch branch is executed. Then, select the right
instruction stream and discard the wrong stream
Branch Target Buffer(BTB; Associative Memory)
– Entry: Addr of previously executed branches; Target instruction
and the next few instructions
– When fetching an instruction, search BTB.
– If found, fetch the instruction stream in BTB;
– If not, new stream is fetched and update BTB
Loop Buffer(High Speed Register file)
– Storage of entire loop that allows to execute a loop without accessingmemory
Branch Prediction
– Guessing the branch condition, and fetch an instruction stream based on
the guess. Correct guess eliminates the branch penalty
Delayed Branch
– Compiler detects the branch and rearranges the instruction sequence
by inserting useful instructions that keep the pipeline busy
in the presence of a branch instruction
30. 30
Pipelining and Vector Processing
RISC PIPELINE
Computer Organization Computer Architectures Lab
RISC Pipeline
RISC
- Machine with a very fast clock cycle that
executes at the rate of one instruction per cycle
<- Simple Instruction Set
Fixed Length Instruction Format
Register-to-Register Operations
Instruction Cycles of Three-Stage Instruction Pipeline
Data Manipulation Instructions
I: Instruction Fetch
A: Decode, Read Registers, ALU Operations
E: Write a Register
Load and Store Instructions
I: Instruction Fetch
A: Decode, Evaluate Effective Address
E: Register-to-Memory or Memory-to-Register
Program Control Instructions
I: Instruction Fetch
A: Decode, Evaluate Branch Address
E: Write Register(PC)
31. 31
Pipelining and Vector Processing
DELAYED LOAD
clock cycle 1 2 3 4 5 6
Load R1 I A E
Load R2 I A E
Add R1+R2 I A E
Store R3 I A E
Pipeline timing with delayed load
clock cycle 1 2 3 4 5 6 7
Load R1 I A E
Load R2 I A E
NOP I A E
Add R1+R2 I A E
Store R3 I A E
LOAD: R1 M[address 1]
LOAD: R2 M[address 2]
ADD: R3 R1 +R2
STORE: M[address 3] R3
Three-segment pipeline timing
Pipeline timing with data conflict
RISC Pipeline
The data dependency is taken
care by the compiler rather
than the hardware
Computer Organization Computer Architectures Lab
32. 32
Pipelining and Vector Processing
DELAYED BRANCH
Computer Organization Computer Architectures Lab
Clock cycles: 1 2 3 4 5 6 7 8 9 10
1. Load I A E
2. Increment I A E
3. Add I A E
4. Subtract I A E
5. Branch toX I A E
6. NOP I A E
7. NOP I A E
8. Instr. in X I A E
Clock cycles: 1 2 3 4 5 6 7 8
1. Load I A E
2. Increment I A E
3. Branch to X I A E
4. Add I A E
5. Subtract I A E
6. Instr. in X I A E
Compiler analyzes the instructions before and after
the branch and rearranges the program sequenceby
inserting useful instructions in the delay steps
Using no-operation instructions
Rearranging the instructions
RISC Pipeline
33. Pipelining and Vector Processing 33
• A vector processor is an ensemble of hardware resources, including
vector registers, functional pipelines, processing elements and register
counters for performing register operations.
• Vector processing occurs when arithmetic or logical operations are
applied to vectors. It is distinguished from scalar processing which
operates on one or one pair of data. The conversion from scalar code
to vector code is called vectorization.
• Both pipelined processors and SIMD computers can perform vector
operations.
• Vector processing reduces software overhead incurred in the
maintenance of looping control, reduces memory access conflicts and
above all matches nicely with pipelining and segmentation concept to
generate one result per each clock cycle continuously.
Computer Organization Computer Architectures Lab
VECTOR PROCESSING
34. Pipelining and Vector Processing 34
VECTOR PROCESSING
Computer Organization Computer Architectures Lab
Vector Processing
Vector Processing Applications
• Problems that can be efficiently formulated in terms of vectors
– Long-range weather forecasting
– Petroleum explorations
– Seismic data analysis
– Medical diagnosis
– Aerodynamics and space flight simulations
– Artificial intelligence and expert systems
– Mapping the human genome
– Image processing
Vector Processor (computer)
Ability to process vectors, and related data structures such as matrices
and multi-dimensional arrays, much faster than conventional computers
Vector Processors may also be pipelined
35. Pipelining and Vector Processing 35
VECTOR PROGRAMMING
Computer Organization Computer Architectures Lab
DO 20 I = 1, 100
20 C(I) = B(I) + A(I)
Conventional computer
Initialize I = 0
20 ReadA(I)
Read B(I)
Store C(I) = A(I) + B(I)
Increment I = i + 1
If I 100 goto 20
Vector computer
C(1:100) = A(1:100) + B(1:100)
Vector Processing
36. Pipelining and Vector Processing 36
VECTOR INSTRUCTIONS
f1: V ->V
f2: V S
f3: V x V V
f4: V x S >V
(Vector-Vector Instruction)
(Vector Reduction Instruction)
(Vector-Vector Instruction)
(Vector- Scaler Instruction)
V: Vector operand
S: Scalar operand
Vector Processing
Type Mnemonic Description (I = 1, ..., n)
f1 VSQR Vector square root B(I) SQR(A(I))
VSIN Vector sine B(I) sin(A(I))
VCOM Vector complement A(I) A(I)
f2 VSUM
VMAX
Vector summation
Vector maximum
S A(I)
S max{A(I)}
f3 VADD Vector add C(I) A(I) + B(I)
VMPY Vector multiply C(I) A(I) * B(I)
VAND Vector AND C(I) A(I) . B(I)
VLAR
VTGE
Vector larger
Vector test >
C(I) max(A(I),B(I))
C(I) 0 if A(I) < B(I)
C(I) 1 if A(I) > B(I)
f4 SADD Vector-scalar add B(I) S + A(I)
SDIV Vector-scalar divide B(I) A(I) / S
Computer Organization Computer Architectures Lab
37. Pipelining and Vector Processing 37
VECTOR INSTRUCTION FORMAT
Operation
code
Base address
source 1
Base address
source 2
Base address
destination
Vector
length
Vector Processing
Vector Instruction Format
S o urce
A
S o urce
B
M ultiplie r
pipeline
Ad d e r
pip e line
Pipeline for Inner Product
Computer Organization Computer Architectures Lab
39. Pipelining and Vector Processing 39 Vector Processing
MULTIPLE MEMORY MODULE AND INTERLEAVING
Address Interleaving
Different sets of addresses are assigned to
different memory modules
Multiple Module Memory
Address bus
Data bus
M0 M1 M2 M3
AR
Memory
array
DR
AR
Memory
array
DR
AR
Memory
array
DR
AR
Memory
array
DR
Computer Organization Computer Architectures Lab