Advanced computer architecture lesson 5 and 6Ismail Mukiibi
The document discusses reduced instruction set computers (RISC) and compares them to complex instruction set computers (CISC). Key characteristics of RISC include simple, uniform instructions that are executed in one cycle; register-to-register operations with simple addressing modes; and a large number of registers to optimize register usage and minimize memory accesses. Studies show programs use simple operations, operands, and addressing modes most frequently, informing the RISC design which aims to efficiently support common cases through hard-wired, streamlined instructions.
This document discusses superscalar and super pipeline approaches to improving processor performance. Superscalar processors execute multiple independent instructions in parallel using multiple pipelines. Super pipelines break pipeline stages into smaller stages to reduce clock period and increase instruction throughput. While superscalar utilizes multiple parallel pipelines, super pipelines perform multiple stages per clock cycle in each pipeline. Super pipelines benefit from higher parallelism but also increase potential stalls from dependencies. Both approaches aim to maximize parallel instruction execution but face limitations from true data and other dependencies.
This document discusses multiprocessor architectures and synchronization issues in multiprocessors. It covers symmetric and distributed shared memory architectures, cache coherence issues, Flynn's taxonomy of parallel architectures including SISD, SIMD, MISD and MIMD models, and basic schemes for enforcing cache coherence including directory-based and snooping-based protocols. It also discusses performance issues, distributed shared memory, and synchronization mechanisms and primitives in multiprocessors.
This document discusses high performance computing techniques including NUMA (Non-Uniform Memory Access) and cache coherence. NUMA allows large scale multiprocessing while maintaining a unified memory space, with each processor node having its own local memory. Cache coherence ensures data consistency across processor caches. CC-NUMA (Cache Coherent NUMA) uses a directory protocol or snoopy protocol to maintain coherence between caches as memory requests access local or remote memory. Hardware solutions provide more effective cache coherence than software-only approaches.
Multithreading allows exploiting thread-level parallelism (TLP) to improve processor utilization. There are several categories of multithreading:
- Superscalar simultaneous multithreading interleaves instructions from multiple threads within a single out-of-order processor core to reduce idle resources.
- Coarse-grained multithreading switches between threads on long-latency events like cache misses to hide latency.
- Fine-grained multithreading interleaves threads at a finer instruction granularity in in-order cores.
- Multiprocessing physically separates threads onto multiple processor cores.
This document discusses high performance computing and Flynn's taxonomy of computer architectures. It describes MIMD architectures including shared memory SMP systems and distributed memory clusters. SMP systems have multiple similar processors that share main memory and I/O. Clusters are groups of interconnected computers that function as a single system. The document compares SMP and cluster architectures.
This document discusses three types of hardware multithreading: coarse-grained, fine-grained, and simultaneous multithreading (SMT). Coarse-grained multithreading allows another thread to run during long stalls of the first thread. Fine-grained multithreading interleaves instructions from multiple threads in a round-robin fashion to hide stalls. SMT issues instructions from multiple threads in the same cycle by using register renaming and dynamic scheduling to maximize utilization.
Very long instruction word or VLIW refers to a processor architecture designed to take advantage of instruction level parallelism
This type of processor architecture is intended to allow higher performance without the inherent complexity of some other approaches.
Advanced computer architecture lesson 5 and 6Ismail Mukiibi
The document discusses reduced instruction set computers (RISC) and compares them to complex instruction set computers (CISC). Key characteristics of RISC include simple, uniform instructions that are executed in one cycle; register-to-register operations with simple addressing modes; and a large number of registers to optimize register usage and minimize memory accesses. Studies show programs use simple operations, operands, and addressing modes most frequently, informing the RISC design which aims to efficiently support common cases through hard-wired, streamlined instructions.
This document discusses superscalar and super pipeline approaches to improving processor performance. Superscalar processors execute multiple independent instructions in parallel using multiple pipelines. Super pipelines break pipeline stages into smaller stages to reduce clock period and increase instruction throughput. While superscalar utilizes multiple parallel pipelines, super pipelines perform multiple stages per clock cycle in each pipeline. Super pipelines benefit from higher parallelism but also increase potential stalls from dependencies. Both approaches aim to maximize parallel instruction execution but face limitations from true data and other dependencies.
This document discusses multiprocessor architectures and synchronization issues in multiprocessors. It covers symmetric and distributed shared memory architectures, cache coherence issues, Flynn's taxonomy of parallel architectures including SISD, SIMD, MISD and MIMD models, and basic schemes for enforcing cache coherence including directory-based and snooping-based protocols. It also discusses performance issues, distributed shared memory, and synchronization mechanisms and primitives in multiprocessors.
This document discusses high performance computing techniques including NUMA (Non-Uniform Memory Access) and cache coherence. NUMA allows large scale multiprocessing while maintaining a unified memory space, with each processor node having its own local memory. Cache coherence ensures data consistency across processor caches. CC-NUMA (Cache Coherent NUMA) uses a directory protocol or snoopy protocol to maintain coherence between caches as memory requests access local or remote memory. Hardware solutions provide more effective cache coherence than software-only approaches.
Multithreading allows exploiting thread-level parallelism (TLP) to improve processor utilization. There are several categories of multithreading:
- Superscalar simultaneous multithreading interleaves instructions from multiple threads within a single out-of-order processor core to reduce idle resources.
- Coarse-grained multithreading switches between threads on long-latency events like cache misses to hide latency.
- Fine-grained multithreading interleaves threads at a finer instruction granularity in in-order cores.
- Multiprocessing physically separates threads onto multiple processor cores.
This document discusses high performance computing and Flynn's taxonomy of computer architectures. It describes MIMD architectures including shared memory SMP systems and distributed memory clusters. SMP systems have multiple similar processors that share main memory and I/O. Clusters are groups of interconnected computers that function as a single system. The document compares SMP and cluster architectures.
This document discusses three types of hardware multithreading: coarse-grained, fine-grained, and simultaneous multithreading (SMT). Coarse-grained multithreading allows another thread to run during long stalls of the first thread. Fine-grained multithreading interleaves instructions from multiple threads in a round-robin fashion to hide stalls. SMT issues instructions from multiple threads in the same cycle by using register renaming and dynamic scheduling to maximize utilization.
Very long instruction word or VLIW refers to a processor architecture designed to take advantage of instruction level parallelism
This type of processor architecture is intended to allow higher performance without the inherent complexity of some other approaches.
Faster microprocessor design presentation in American International University-Bangladesh (AIUB). Presentation was taken under the subject "SELECTED TOPICS IN ELECTRICAL AND ELECTRONIC ENGINEERING (PROCESSOR AND DSP HARDWARE DESIGN WITH SYSTEM VERILOG, VHDL AND FPGAS) [MEEE]", as a final semester student of M.Sc at AIUB.
There are two main types of parallel computers: shared memory multiprocessors and distributed memory multicomputers. Shared memory multiprocessors have multiple processors that can access a shared memory address space, while distributed memory multicomputers consist of separate computers connected by an interconnect network that communicate by message passing. Beowulf clusters are a type of distributed memory multicomputer made from interconnected commodity computers that provide high-performance computing at low cost. Programming distributed memory systems requires using message passing libraries to explicitly specify communication between processes on different computers.
Parallel computing is a type of computation in which many calculations or the execution of processes are carried out simultaneously. Large problems can often be divided into smaller ones, which can then be solved at the same time. There are several different forms of parallel computing: bit-level, instruction-level, data, and task parallelism. Parallelism has been employed for many years, mainly in high-performance computing, but interest in it has grown lately due to the physical constraints preventing frequency scaling. As power consumption (and consequently heat generation) by computers has become a concern in recent years, parallel computing has become the dominant paradigm in computer architecture, mainly in the form of multi-core processors.
Simultaneous multithreading (SMT) allows multiple independent threads to issue and execute instructions simultaneously each clock cycle by sharing the functional units of a superscalar processor. This improves performance over conventional multithreading approaches like coarse-grained and fine-grained multithreading. SMT provides good performance across a wide range of workloads by utilizing instruction issue slots and execution resources that would otherwise go unused when a single thread is limited by dependencies or cache misses. Implementing SMT requires minimal additional hardware like multiple program counters and per-thread scheduling structures.
This document provides an outline for the course CS-416 Parallel and Distributed Systems. It discusses key topics like parallel computing concepts, architectures, algorithms, and programming environments. Parallel computing involves using multiple compute resources simultaneously by breaking a problem into discrete parts that can execute concurrently on different processors. The main types of parallel processes are sequential and parallel. Parallelism is useful for solving huge complex problems faster using techniques like decomposition, data parallelism, and task parallelism. Popular parallel programming environments include MPI, OpenMP, and hybrid models.
This document discusses parallel processing and computer architecture. It begins by using the analogy of making sandwiches to explain the difference between sequential and parallel processing. Parallel processing allows multiple tasks to be performed simultaneously using multiple processors. The document then discusses different types of parallel processor systems like SISD, SIMD, MISD, and MIMD. It also covers considerations for multiprocessor operating system design, symmetric multiprocessor systems, bus organization, cache coherence issues, and solutions to cache coherence problems like directory and snoopy protocols.
Graphics processing uni computer archiectureHaris456
This document discusses graphics processing units (GPUs) and their evolution from specialized hardware for graphics to general-purpose parallel processors. It covers key aspects of GPU architecture like their massively parallel threading model, SIMD execution, memory hierarchy, and how programming models like CUDA map computations to large numbers of threads grouped into blocks. Examples of GPU hardware like Nvidia's Fermi architecture are also summarized.
This document discusses parallel computer architectures and focuses on multiple instruction multiple data (MIMD) systems. It describes tightly coupled symmetric multiprocessors (SMPs) that share memory and loosely coupled clusters that communicate over a network. SMPs have advantages of performance, availability and scalability but clusters provide greater scalability and availability through their distributed nature. Operating systems for clusters must handle failure management, load balancing and parallelizing applications across nodes.
This document discusses computer architecture and microprocessors. It covers early Von Neumann architecture from 1940 and its features. It then discusses improvements with 32-bit conventional microprocessors including higher data throughput, larger addressing ranges, and faster clock speeds. Additional functions were added to microprocessors like memory management units, floating point units, and interrupt controllers. The document also covers concepts like pipelining, cache memory, memory interleaving, and parallel architectures that were developed to increase processing speeds as technology advanced.
The document discusses processors and their core functions. It explains that a processor is the central component of a computer that analyzes data, controls data flow, and manages core functions. It then describes the four main steps in a processor's work: fetch, decode, execute, and write back. The document also contrasts RISC (reduced instruction set computer) and CISC (complex instruction set computer) processors, noting key differences in their instruction sets, performance optimization approaches, decoding complexity, execution times, and common examples of each type.
An explicitly parallel program must specify concurrency and interaction between concurrent subtasks.
The former is sometimes also referred to as the control structure and the latter as the communication model.
This presentation discusses array processors, which are parallel computers composed of multiple identical processing elements that can operate simultaneously. The presentation covers the history of array processors, how they work, classifications, architectures, performance and scalability. It explains that array processors are well-suited for tasks involving repetitive arithmetic operations on large datasets, as they can improve performance for such workloads, but may not provide benefits for operations with data dependencies or decisions based on computations.
This document provides an introduction to high performance computer architecture and multiprocessors. It discusses how initial improvements in computer performance came from innovative manufacturing techniques and exploitation of instruction level parallelism (ILP). More recently, exploiting thread and process level parallelism across multiple processors has become a focus. The key types of multiprocessor architectures discussed are symmetric multiprocessors (SMPs) and distributed memory computers which use message passing. SMPs connect multiple processors to a shared memory using a bus, while distributed memory computers require explicit message passing between separate processor memories.
Chip Multithreading Systems Need a New Operating System Scheduler Sarwan ali
This document discusses the need for a new operating system scheduler for chip multithreading (CMT) systems. CMT combines chip multiprocessing and hardware multithreading to improve processor utilization. The current schedulers do not scale well to the large number of hardware threads in CMT systems. A new scheduler is proposed that would model resource contention and use this to minimize contention and maximize throughput when assigning threads to processors. Experiments show that resource contention, especially in the processor pipeline, has a significant impact on performance and a CMT-aware scheduler could improve performance by up to 2x.
This document provides an overview of hardware multithreading techniques including fine-grained, coarse-grained, and simultaneous multithreading. Fine-grained multithreading switches threads after every instruction to hide latency. Coarse-grained multithreading switches threads only after long stalls to avoid slowing individual threads. Simultaneous multithreading issues instructions from multiple threads each cycle to better utilize functional units.
This document discusses superscalar and VLIW architectures. Superscalar processors can execute multiple independent instructions in parallel by checking for dependencies between instructions. VLIW architectures package multiple operations into very long instruction words to execute in parallel on multiple functional units with scheduling done at compile-time rather than run-time. The document compares CISC, RISC, and VLIW instruction sets and outlines advantages and disadvantages of the VLIW approach.
This document discusses parallel processing and parallel organizations. It describes four types of parallel organizations: single instruction single data (SISD), single instruction multiple data (SIMD), multiple instruction single data (MISD), and multiple instruction multiple data (MIMD). MIMD systems are further broken down into shared memory and distributed memory architectures. Cache coherence protocols like MESI are discussed for maintaining consistency across caches in shared memory multiprocessors.
This chapter discusses shared memory architecture and classifications of shared memory systems. It describes Uniform Memory Access (UMA), Non-Uniform Memory Access (NUMA), and Cache Only Memory Architecture (COMA). It also covers basic cache coherency methods like write-through, write-back, write-invalidate, and write-update. Finally, it discusses snooping protocols and cache coherency techniques used in shared memory systems.
This document provides an overview of system architecture and processor architectures. It discusses different types of system architecture like system-level building blocks, components of a system, hardware and software implementation, and instruction-level parallelism. It also describes various processor architectures like sequential, pipelined, superscalar, VLIW, SIMD, array, and vector processors. Additionally, it covers memory and addressing in systems-on-chip including memory considerations, virtual memory, and the process of determining physical memory addresses.
The document discusses various topics related to parallel and distributed computing including parallel computing resources and concepts, Flynn's taxonomy of parallel systems, parallel computer memory architectures like shared memory and distributed memory, parallel programming models such as shared memory, message passing and data parallel models, designing parallel programs including partitioning and load balancing, and different parallel computer architectures like vector processors, very long instruction word architecture, and superpipelined architecture.
Faster microprocessor design presentation in American International University-Bangladesh (AIUB). Presentation was taken under the subject "SELECTED TOPICS IN ELECTRICAL AND ELECTRONIC ENGINEERING (PROCESSOR AND DSP HARDWARE DESIGN WITH SYSTEM VERILOG, VHDL AND FPGAS) [MEEE]", as a final semester student of M.Sc at AIUB.
There are two main types of parallel computers: shared memory multiprocessors and distributed memory multicomputers. Shared memory multiprocessors have multiple processors that can access a shared memory address space, while distributed memory multicomputers consist of separate computers connected by an interconnect network that communicate by message passing. Beowulf clusters are a type of distributed memory multicomputer made from interconnected commodity computers that provide high-performance computing at low cost. Programming distributed memory systems requires using message passing libraries to explicitly specify communication between processes on different computers.
Parallel computing is a type of computation in which many calculations or the execution of processes are carried out simultaneously. Large problems can often be divided into smaller ones, which can then be solved at the same time. There are several different forms of parallel computing: bit-level, instruction-level, data, and task parallelism. Parallelism has been employed for many years, mainly in high-performance computing, but interest in it has grown lately due to the physical constraints preventing frequency scaling. As power consumption (and consequently heat generation) by computers has become a concern in recent years, parallel computing has become the dominant paradigm in computer architecture, mainly in the form of multi-core processors.
Simultaneous multithreading (SMT) allows multiple independent threads to issue and execute instructions simultaneously each clock cycle by sharing the functional units of a superscalar processor. This improves performance over conventional multithreading approaches like coarse-grained and fine-grained multithreading. SMT provides good performance across a wide range of workloads by utilizing instruction issue slots and execution resources that would otherwise go unused when a single thread is limited by dependencies or cache misses. Implementing SMT requires minimal additional hardware like multiple program counters and per-thread scheduling structures.
This document provides an outline for the course CS-416 Parallel and Distributed Systems. It discusses key topics like parallel computing concepts, architectures, algorithms, and programming environments. Parallel computing involves using multiple compute resources simultaneously by breaking a problem into discrete parts that can execute concurrently on different processors. The main types of parallel processes are sequential and parallel. Parallelism is useful for solving huge complex problems faster using techniques like decomposition, data parallelism, and task parallelism. Popular parallel programming environments include MPI, OpenMP, and hybrid models.
This document discusses parallel processing and computer architecture. It begins by using the analogy of making sandwiches to explain the difference between sequential and parallel processing. Parallel processing allows multiple tasks to be performed simultaneously using multiple processors. The document then discusses different types of parallel processor systems like SISD, SIMD, MISD, and MIMD. It also covers considerations for multiprocessor operating system design, symmetric multiprocessor systems, bus organization, cache coherence issues, and solutions to cache coherence problems like directory and snoopy protocols.
Graphics processing uni computer archiectureHaris456
This document discusses graphics processing units (GPUs) and their evolution from specialized hardware for graphics to general-purpose parallel processors. It covers key aspects of GPU architecture like their massively parallel threading model, SIMD execution, memory hierarchy, and how programming models like CUDA map computations to large numbers of threads grouped into blocks. Examples of GPU hardware like Nvidia's Fermi architecture are also summarized.
This document discusses parallel computer architectures and focuses on multiple instruction multiple data (MIMD) systems. It describes tightly coupled symmetric multiprocessors (SMPs) that share memory and loosely coupled clusters that communicate over a network. SMPs have advantages of performance, availability and scalability but clusters provide greater scalability and availability through their distributed nature. Operating systems for clusters must handle failure management, load balancing and parallelizing applications across nodes.
This document discusses computer architecture and microprocessors. It covers early Von Neumann architecture from 1940 and its features. It then discusses improvements with 32-bit conventional microprocessors including higher data throughput, larger addressing ranges, and faster clock speeds. Additional functions were added to microprocessors like memory management units, floating point units, and interrupt controllers. The document also covers concepts like pipelining, cache memory, memory interleaving, and parallel architectures that were developed to increase processing speeds as technology advanced.
The document discusses processors and their core functions. It explains that a processor is the central component of a computer that analyzes data, controls data flow, and manages core functions. It then describes the four main steps in a processor's work: fetch, decode, execute, and write back. The document also contrasts RISC (reduced instruction set computer) and CISC (complex instruction set computer) processors, noting key differences in their instruction sets, performance optimization approaches, decoding complexity, execution times, and common examples of each type.
An explicitly parallel program must specify concurrency and interaction between concurrent subtasks.
The former is sometimes also referred to as the control structure and the latter as the communication model.
This presentation discusses array processors, which are parallel computers composed of multiple identical processing elements that can operate simultaneously. The presentation covers the history of array processors, how they work, classifications, architectures, performance and scalability. It explains that array processors are well-suited for tasks involving repetitive arithmetic operations on large datasets, as they can improve performance for such workloads, but may not provide benefits for operations with data dependencies or decisions based on computations.
This document provides an introduction to high performance computer architecture and multiprocessors. It discusses how initial improvements in computer performance came from innovative manufacturing techniques and exploitation of instruction level parallelism (ILP). More recently, exploiting thread and process level parallelism across multiple processors has become a focus. The key types of multiprocessor architectures discussed are symmetric multiprocessors (SMPs) and distributed memory computers which use message passing. SMPs connect multiple processors to a shared memory using a bus, while distributed memory computers require explicit message passing between separate processor memories.
Chip Multithreading Systems Need a New Operating System Scheduler Sarwan ali
This document discusses the need for a new operating system scheduler for chip multithreading (CMT) systems. CMT combines chip multiprocessing and hardware multithreading to improve processor utilization. The current schedulers do not scale well to the large number of hardware threads in CMT systems. A new scheduler is proposed that would model resource contention and use this to minimize contention and maximize throughput when assigning threads to processors. Experiments show that resource contention, especially in the processor pipeline, has a significant impact on performance and a CMT-aware scheduler could improve performance by up to 2x.
This document provides an overview of hardware multithreading techniques including fine-grained, coarse-grained, and simultaneous multithreading. Fine-grained multithreading switches threads after every instruction to hide latency. Coarse-grained multithreading switches threads only after long stalls to avoid slowing individual threads. Simultaneous multithreading issues instructions from multiple threads each cycle to better utilize functional units.
This document discusses superscalar and VLIW architectures. Superscalar processors can execute multiple independent instructions in parallel by checking for dependencies between instructions. VLIW architectures package multiple operations into very long instruction words to execute in parallel on multiple functional units with scheduling done at compile-time rather than run-time. The document compares CISC, RISC, and VLIW instruction sets and outlines advantages and disadvantages of the VLIW approach.
This document discusses parallel processing and parallel organizations. It describes four types of parallel organizations: single instruction single data (SISD), single instruction multiple data (SIMD), multiple instruction single data (MISD), and multiple instruction multiple data (MIMD). MIMD systems are further broken down into shared memory and distributed memory architectures. Cache coherence protocols like MESI are discussed for maintaining consistency across caches in shared memory multiprocessors.
This chapter discusses shared memory architecture and classifications of shared memory systems. It describes Uniform Memory Access (UMA), Non-Uniform Memory Access (NUMA), and Cache Only Memory Architecture (COMA). It also covers basic cache coherency methods like write-through, write-back, write-invalidate, and write-update. Finally, it discusses snooping protocols and cache coherency techniques used in shared memory systems.
This document provides an overview of system architecture and processor architectures. It discusses different types of system architecture like system-level building blocks, components of a system, hardware and software implementation, and instruction-level parallelism. It also describes various processor architectures like sequential, pipelined, superscalar, VLIW, SIMD, array, and vector processors. Additionally, it covers memory and addressing in systems-on-chip including memory considerations, virtual memory, and the process of determining physical memory addresses.
The document discusses various topics related to parallel and distributed computing including parallel computing resources and concepts, Flynn's taxonomy of parallel systems, parallel computer memory architectures like shared memory and distributed memory, parallel programming models such as shared memory, message passing and data parallel models, designing parallel programs including partitioning and load balancing, and different parallel computer architectures like vector processors, very long instruction word architecture, and superpipelined architecture.
The document provides an overview of microprocessors and microcontrollers. It discusses the basic architecture of microprocessors, including the Von Neumann and Harvard architectures. It compares RISC and CISC instruction sets. Microcontrollers are defined as single-chip computers containing a CPU, memory, and I/O ports. Common PIC microcontrollers are described along with their characteristics such as speed, memory types, and analog/digital capabilities. The document also outlines best practices for selecting a suitable microcontroller for a project, including identifying hardware interfaces, memory needs, programming tools, and cost/power constraints.
RISC Vs CISC Computer architecture and designyousefzahdeh
RISC and CISC are two approaches to microprocessor architecture. RISC utilizes a small, highly optimized instruction set where each instruction is simple and can be executed in a single clock cycle. CISC uses more complex instructions that can perform multiple operations in one instruction. While RISC requires more instructions, CISC requires more complex processor design and has longer execution times. Over time, the two approaches have converged as technologies allow CISC processors to better support pipelining and RISC processors to include more complex instructions.
This document provides an overview of parallelism, including the need for parallelism, types of parallelism, applications of parallelism, and challenges in parallelism. It discusses instruction level parallelism and data level parallelism in software. It describes Flynn's classification of computer architectures and the categories of SISD, SIMD, MISD, and MIMD. It also covers hardware multi-threading, uni-processors vs multi-processors, multi-core processors, memory in multi-processor systems, cache coherency, and the MESI protocol.
This document provides an overview of parallelism and parallel computing architectures. It discusses the need for parallelism to improve performance and throughput. The main types of parallelism covered are instruction level parallelism, data parallelism, and task parallelism. Flynn's taxonomy is introduced for classifying computer architectures based on their instruction and data streams. Common parallel architectures like SISD, SIMD, MIMD are explained. The document also covers memory architectures for multi-processor systems including shared memory, distributed memory, and cache coherency protocols.
This document compares RISC and CISC processor architectures. It discusses that CISC processors have more complex instructions that can perform multiple operations, while RISC processors have simpler instructions that are optimized to complete in one clock cycle. CISC was developed earlier when memory was expensive, to reduce the number of instructions, while RISC focuses on increasing processor speed. RISC has advantages of faster execution and simpler hardware design, while CISC allows for more compact code.
This document discusses CPU scheduling and multithreaded programming. It covers key concepts in CPU scheduling like multiprogramming, CPU-I/O burst cycles, and scheduling criteria. It also discusses dispatcher role, multilevel queue scheduling, and multiple processor scheduling challenges. For multithreaded programming, it defines threads and their benefits. It compares concurrency and parallelism and discusses multithreading models, thread libraries, and threading issues.
This document discusses RISC and CISC architectures and how they have evolved. It explains that RISC aims to simplify instructions while CISC combines operations. Both seek to improve CPU performance but RISC reduces cycles per instruction while CISC minimizes instructions per program. It then covers characteristics of RISC like simple decoding and CISC like complex decoding. Pipelining is described as arranging hardware to simultaneously execute instructions to improve processor performance. The document ends by detailing the typical stages in a RISC processor's instruction pipeline.
This document provides an overview of parallel computing models and the evolution of computer hardware and software. It discusses:
1) Flynn's taxonomy which classifies computer architectures based on whether they have a single or multiple instruction/data streams. This includes SISD, SIMD, MISD, and MIMD models.
2) The attributes that influence computer performance such as hardware technology, algorithms, data structures, and programming tools. Performance is measured by turnaround time, clock rate, and cycles per instruction.
3) A brief history of computing from mechanical devices to modern electronic computers organized into generations defined by advances in hardware and software.
The document discusses several difficulties in pipelining processors, including timing variations between stages, data hazards when instructions reference the same data, branching unpredictability, and interrupt effects. It also lists advantages like reduced cycle time and increased throughput, and disadvantages like design complexity. Later, it covers superscalar processors that can execute multiple instructions per cycle using multiple arithmetic logic units and resources, and very long instruction word processors where the compiler statically schedules parallel instructions. Finally, it discusses RISC, CISC, and EPIC commercial processor examples.
SIMD (single instruction, multiple data) parallel processors exploit data-level parallelism by performing the same operation on multiple data points simultaneously using a single instruction. Vector processors are a type of SIMD parallel processor that operate on 1D arrays of data called vectors. They contain vector registers that can hold multiple data elements and functional units that perform arithmetic and logical operations in a pipelined fashion on entire vectors. Array processors are another type of SIMD machine composed of multiple identical processing elements that perform computations in lockstep under the control of a single instruction unit. Early examples include the ILLIAC IV and Cray X1 supercomputers. Multimedia extensions like MMX provide SIMD integer operations to improve performance of multimedia applications.
This document discusses parallel processors, specifically single instruction multiple data (SIMD) processors. It provides details on vector processors and array processors. Vector processors utilize vector instructions that operate on arrays of data called vectors. They have vector registers, functional units, and load/store units. Array processors perform parallel computations on large data arrays using multiple identical processing elements. The document describes dedicated memory and global memory organizations for array processors. It provides examples of early SIMD machines like ILLIAC IV.
Computer system Architecture. This PPT is based on computer systemmohantysikun0
This document discusses thread and process-level parallelism. It begins by introducing how improvements to computer performance initially came from manufacturing techniques and exploitation of instruction-level parallelism (ILP), but that ILP is now fully exploited. It states that the way to achieve higher performance now is through exploiting parallelism across multiple processes or threads. It provides examples of how individual transactions in a banking application could be executed in parallel.
This document discusses multiprocessor systems. It begins by defining a multiprocessor as an interconnection of two or more CPUs, memory, and I/O equipment. Multiprocessors are classified as either tightly coupled or loosely coupled based on how their memory is organized. Tightly coupled multiprocessors have shared memory across CPUs while loosely coupled multiprocessors have distributed memory. The document then covers various interconnection structures used in multiprocessors like bus, memory, switch networks, and hypercubes. It concludes by discussing advantages of multiprocessing like improved performance, reliability, and throughput.
CS304PC:Computer Organization and Architecture UNIT V_merged_merged.pdfAsst.prof M.Gokilavani
This document discusses RISC and CISC processors. It defines RISC as having a reduced instruction set with simple instructions that each take one clock cycle. CISC has a more complex instruction set that can take multiple cycles. The document outlines the characteristics and advantages/disadvantages of both RISC and CISC. It also discusses parallel processing techniques like pipelining and vector processing that improve processor throughput.
Unit IV discusses parallelism and parallel processing architectures. It introduces Flynn's classifications of parallel systems as SISD, MIMD, SIMD, and SPMD. Hardware approaches to parallelism include multicore processors, shared memory multiprocessors, and message-passing systems like clusters, GPUs, and warehouse-scale computers. The goals of parallelism are to increase computational speed and throughput by processing data concurrently across multiple processors.
This document discusses embedded systems and the MSP430 microcontroller. It begins with an introduction to embedded systems that defines them, lists their applications, and describes their classification based on generation and complexity. Next, it covers the typical features and architecture considerations of embedded systems, including the CPU, memory, I/O, and common peripherals. The document then discusses the MSP430 microcontroller family, providing details on the MSP430F2013 model, its memory map, CPU architecture and instruction set. It concludes with an overview of the variants in the MSP430 family.
1. CISC VS. RISC.
2. Agenda.
3. CPU Architecture.
4. Instruction Set Architecture (ISA). Group of instructions to execute a program. Instructions are in the form of: Opcode + Operand. An agreement between hardware and human for making interaction. Example : ADD R1, R2, R3
Can be represented as :
00101111100001111001010101010101
10111010100011110101001011011010
Two major schools of ISA: CISC & RISC.
5. CISC Philosophy (Complex Instruction Set Computing). The primary goal is to complete a task in as few lines as possible. Used on PCs and laptops that need to process heavy graphics and computations. Each instruction consist of one step.
(ex: MULT 2:3, 5:2, load the two values into registers, multiplies the operands, and then stores the product in appropriate register).
6. CISC Pros & Cons. Instruction size is different from one operation to another. Operation size is smaller but no of cycles are more. Needs better hardware and powerful processing. Performance is slow due to the amount of clock time taken by different instructions.
7. RISC Philosophy (Reduced Instruction Set Computing). Use only simple instructions that can be executed within one clock cycle. Keep all instructions of same size. Allow only load/store instruction to access the memory.
(ex: MULT command divided into three separate commands:LOAD, PROD, and STORE).
8. RISC Pros & Cons. Allow free use of microprocessors space because of its simplicity. Needs large memory caches on the chip itself so require very fast memory. Give support for high level languages (like C, C++, Java). Performance depends on the programmer or compiler.
9. CPU Performance Equation. The following equation is commonly used for expressing a computer's performance ability:
퐶푃푈 푇푖푚푒=푆푒푐표푛푑푠/푃푟표푔푟푎푚=퐼푛푠푡푟푢푐푡푖표푛푠/푃푟표푔푟푎푚 푥 퐶푦푐푙푒푠/퐼푛푠푡푟푢푐푡푖표푛푠 푥 푆푒푐표푛푑푠/퐶푦푐푙푒
CISC minimize the number of instructions per program.
RISC does the opposite, reduce the cycles per instruction.
10. Summary.
Micro operations
Fetch, Indirect, Interrupt, Execute, Instruction Cycle
Control Unit
Hardwired Control Unit
Microprogrammed Control Unit
Wilkie's Microprogrammed Control Unit
Processor Organization and ArchitectureDhaval Bagal
This document discusses processor organization and architecture. It covers the stored program concept where both instructions and data are stored in memory. It describes the Von Neumann architecture, which includes a main memory, ALU, control unit, and I/O. It discusses the registers used in processor control and execution like the program counter, accumulator, and instruction register. Finally, it examines addressing modes like immediate, direct, indirect, register, displacement, and stack addressing.
Computer Organization and Architecture OverviewDhaval Bagal
This document discusses computer architecture and organization. It defines architecture as the attributes visible to the programmer, like instruction set and data representation, while organization refers to the operational units and interconnections that implement the architecture. Examples of architectural design issues include whether there is a multiply instruction, while organizational issues could be whether multiplication is done in hardware or software. The architecture may not change often but the organization does as technology advances to improve performance and speed.
A SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMSIJNSA Journal
The smart irrigation system represents an innovative approach to optimize water usage in agricultural and landscaping practices. The integration of cutting-edge technologies, including sensors, actuators, and data analysis, empowers this system to provide accurate monitoring and control of irrigation processes by leveraging real-time environmental conditions. The main objective of a smart irrigation system is to optimize water efficiency, minimize expenses, and foster the adoption of sustainable water management methods. This paper conducts a systematic risk assessment by exploring the key components/assets and their functionalities in the smart irrigation system. The crucial role of sensors in gathering data on soil moisture, weather patterns, and plant well-being is emphasized in this system. These sensors enable intelligent decision-making in irrigation scheduling and water distribution, leading to enhanced water efficiency and sustainable water management practices. Actuators enable automated control of irrigation devices, ensuring precise and targeted water delivery to plants. Additionally, the paper addresses the potential threat and vulnerabilities associated with smart irrigation systems. It discusses limitations of the system, such as power constraints and computational capabilities, and calculates the potential security risks. The paper suggests possible risk treatment methods for effective secure system operation. In conclusion, the paper emphasizes the significant benefits of implementing smart irrigation systems, including improved water conservation, increased crop yield, and reduced environmental impact. Additionally, based on the security analysis conducted, the paper recommends the implementation of countermeasures and security approaches to address vulnerabilities and ensure the integrity and reliability of the system. By incorporating these measures, smart irrigation technology can revolutionize water management practices in agriculture, promoting sustainability, resource efficiency, and safeguarding against potential security threats.
Comparative analysis between traditional aquaponics and reconstructed aquapon...bijceesjournal
The aquaponic system of planting is a method that does not require soil usage. It is a method that only needs water, fish, lava rocks (a substitute for soil), and plants. Aquaponic systems are sustainable and environmentally friendly. Its use not only helps to plant in small spaces but also helps reduce artificial chemical use and minimizes excess water use, as aquaponics consumes 90% less water than soil-based gardening. The study applied a descriptive and experimental design to assess and compare conventional and reconstructed aquaponic methods for reproducing tomatoes. The researchers created an observation checklist to determine the significant factors of the study. The study aims to determine the significant difference between traditional aquaponics and reconstructed aquaponics systems propagating tomatoes in terms of height, weight, girth, and number of fruits. The reconstructed aquaponics system’s higher growth yield results in a much more nourished crop than the traditional aquaponics system. It is superior in its number of fruits, height, weight, and girth measurement. Moreover, the reconstructed aquaponics system is proven to eliminate all the hindrances present in the traditional aquaponics system, which are overcrowding of fish, algae growth, pest problems, contaminated water, and dead fish.
A review on techniques and modelling methodologies used for checking electrom...nooriasukmaningtyas
The proper function of the integrated circuit (IC) in an inhibiting electromagnetic environment has always been a serious concern throughout the decades of revolution in the world of electronics, from disjunct devices to today’s integrated circuit technology, where billions of transistors are combined on a single chip. The automotive industry and smart vehicles in particular, are confronting design issues such as being prone to electromagnetic interference (EMI). Electronic control devices calculate incorrect outputs because of EMI and sensors give misleading values which can prove fatal in case of automotives. In this paper, the authors have non exhaustively tried to review research work concerned with the investigation of EMI in ICs and prediction of this EMI using various modelling methodologies and measurement setups.
Using recycled concrete aggregates (RCA) for pavements is crucial to achieving sustainability. Implementing RCA for new pavement can minimize carbon footprint, conserve natural resources, reduce harmful emissions, and lower life cycle costs. Compared to natural aggregate (NA), RCA pavement has fewer comprehensive studies and sustainability assessments.
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024Sinan KOZAK
Sinan from the Delivery Hero mobile infrastructure engineering team shares a deep dive into performance acceleration with Gradle build cache optimizations. Sinan shares their journey into solving complex build-cache problems that affect Gradle builds. By understanding the challenges and solutions found in our journey, we aim to demonstrate the possibilities for faster builds. The case study reveals how overlapping outputs and cache misconfigurations led to significant increases in build times, especially as the project scaled up with numerous modules using Paparazzi tests. The journey from diagnosing to defeating cache issues offers invaluable lessons on maintaining cache integrity without sacrificing functionality.
Using recycled concrete aggregates (RCA) for pavements is crucial to achieving sustainability. Implementing RCA for new pavement can minimize carbon footprint, conserve natural resources, reduce harmful emissions, and lower life cycle costs. Compared to natural aggregate (NA), RCA pavement has fewer comprehensive studies and sustainability assessments.
ACEP Magazine edition 4th launched on 05.06.2024Rahul
This document provides information about the third edition of the magazine "Sthapatya" published by the Association of Civil Engineers (Practicing) Aurangabad. It includes messages from current and past presidents of ACEP, memories and photos from past ACEP events, information on life time achievement awards given by ACEP, and a technical article on concrete maintenance, repairs and strengthening. The document highlights activities of ACEP and provides a technical educational article for members.
Advanced control scheme of doubly fed induction generator for wind turbine us...IJECEIAES
This paper describes a speed control device for generating electrical energy on an electricity network based on the doubly fed induction generator (DFIG) used for wind power conversion systems. At first, a double-fed induction generator model was constructed. A control law is formulated to govern the flow of energy between the stator of a DFIG and the energy network using three types of controllers: proportional integral (PI), sliding mode controller (SMC) and second order sliding mode controller (SOSMC). Their different results in terms of power reference tracking, reaction to unexpected speed fluctuations, sensitivity to perturbations, and resilience against machine parameter alterations are compared. MATLAB/Simulink was used to conduct the simulations for the preceding study. Multiple simulations have shown very satisfying results, and the investigations demonstrate the efficacy and power-enhancing capabilities of the suggested control system.
2. • The essence of the superscalar approach is the ability to execute instructions
independently and concurrently in different pipelines.
• In a traditional scalar organization, there is a single pipelined func- tional unit for integer
operations and one for floating-point operations.
• In the superscalar organization, there are multiple functional units, each of which is
implemented as a pipeline.
• Each individual functional unit provides a degree of parallelism by virtue of its pipelined
structure
• The use of multiple functional units enables the processor to execute streams of
instructions in parallel, one stream for each pipeline.
• It is the responsibility of the hardware, in conjunction with the compiler, to assure that the
parallel execution does not violate the intent of the program.
Superscalar Architecture
3.
4. • Superpipelining exploits the fact that many pipeline stages perform tasks that require less than
half a clock cycle.
• The base pipeline issues one instruction per clock cycle and can perform one pipeline stage per
clock cycle.
• Note that although several instructions are executing concurrently, only one instruction is in its
execution stage at any one time.
• The next part of the diagram shows a superpipelined implementation that is capable of performing
two pipeline stages per clock cycle.
• An alternative way of looking at this is that the functions performed in each stage can be split into
two nonoverlapping parts and each can execute in half a clock cycle.
• A superpipeline implementation that behaves in this fashion is said to be of degree 2. Finally, the
lowest part of the diagram shows a superscalar implementation capable of executing two
instances of each stage in parallel.
Superpipelining
7. • True Data Dependency : True Data Dependency is when the second instruction can
be fetched and decoded but cannot execute until the first instruction executes. The
reason is that the second instruction needs data produced by the first instruction.
• Procedural Dependencies : The presence of branches in an instruction sequence
complicates the pipeline operation. The instructions following a branch (taken or not
taken) have a procedural dependency on the branch and cannot be executed until the
branch is executed.
• Resource Conflicts : A resource conflict is a competition of two or more instructions
for the same resource at the same time. Examples of resources include memories,
caches, buses, register-file ports, and functional units
9. • A taxonomy first introduced by Flynn [FLYN72] is still the most common way of categorizing
systems with parallel processing capability. Flynn proposed the follow- ing categories of
computer systems:
• Single instruction, single data (SISD) stream : A single processor executes a single
instruction stream to operate on data stored in a single memory. Uniprocessors fall into
this category.
• Single instruction, multiple data (SIMD) stream : A single machine instruction controls
the simultaneous execution of a number of processing elements on a lockstep basis.
Each processing element has an associated data memory, so that instructions are
executed on different sets of data by different processors. Vector and array processors
fall into this category
• Multiple instruction, single data (MISD) stream : A sequence of data is trans- mitted to a
set of processors, each of which executes a different instruction sequence. This
structure is not commercially implemented.
• Multiple instruction, multiple data (MIMD) stream : A set of processors simultaneously
execute different instruction sequences on different data sets.
10. • For SISD there is some sort of control unit (CU) that provides an instruction stream
(IS) to a processing unit (PU). The processing unit operates on a single data stream
(DS) from a memory unit (MU).
• For SIMD, there is still a single control unit, now feeding a single instruction stream to
multiple PUs. Each PU may have its own dedicated memory or there may be a shared
memory.
11. • Finally, with the MIMD, there are multiple control units, each feeding a separate
instruction stream to its own PU. The MIMD may be a shared-memory multiprocessor
or a distributed- memory multicomputer
12. • RISC stands for Reduced Instruction Set Computer. RISC processor design has
separate digital circuitry in the control unit, which produces all the necessary signals
needed for the execution of each instruction in the instruction set of the processor.
• Examples of RISC processors:
• IBM RS6000, MC88100
• DEC’s Alpha 21064, 21164 and 21264 processors
RISC Architecture
13. • RISC processors use a small and limited number of instructions. This puts emphasis
on software and compiler design due to the relatively simple instruction set.
• RISC machines mostly uses hardwired control unit.
• RISC processors consume less power and have high performance. RISC processors
have been known to be heavily pipelined this ensures that the hardware resources of
the processor are utilized to a maximum giving higher throughput and also consuming
less power.
• Each instruction is very simple and consistent. Most instructions in a RISC instruction
set are very simple that get executed in one clock cycle.
• RISC processors use simple addressing modes.
• RISC instruction is of uniform fixed length.
• The RISC design philosophy generally incorporates a larger number of registers to
prevent in large amounts of interactions with memory
14. • CISC stands for Complex Instruction Set Computer. If the control unit contains a
number of micro-electronic circuitry to generate a set of control signals and each
micro-circuitry is activated by a micro-code, this design approach is called CISC
design. The primary goal of CISC architecture is to complete a task in as few lines of
assembly code as possible.
• Examples of CISC processors are:
• Intel 386, 486, Pentium, Pentium Pro, Pentium II, Pentium III
• Motorola’s 68000, 68020, 68040, etc.
CISC Architecture
15. • CISC chips have complex instructions. A CISC processor would come prepared with
a specific instruction (call it "MULT"). Thus, the entire task of multiplying two numbers
(2,3) can be completed with one instruction:
• MULT is what is known as a "complex instruction." It operates directly on the
computer's memory banks and does not require the programmer to explicitly call any
loading or storing functions. It closely resembles a command in a higher level
language.
• There are a variety of instructions many of which are complex and thus make up for
smaller assembly code thus leading to very low RAM consumption.
• CISC machines generally make use of complex addressing modes.
• The decision of CISC processor designers to provide a variety of addressing modes
leads to variable-length instructions. For example, instruction length increases if an
operand is in memory as opposed to in a register.
16. • The complex instruction set and smaller assembly code meant little work for the
compiler and thus eased up compiler design
• CISC machines uses micro-program control unit which consist of micro programs that
are stored in a control memory like ROM from where the CPU accesses them and
generates control signals.
• CISC processors are having limited number of registers. CISC processors normally
only have a single set of registers. Since the addressing modes give provisions for
memory operands, limited number of “costly” register memory is sufficient for the
functions.