The document provides an overview of reduced instruction set computers (RISC) and their advantages over complex instruction set computers (CISC). It discusses how research in the 1970s-1980s found that most instructions were for data movement and program flow control. This led computer architects to design RISC processors that optimized for these common instructions. Key characteristics of RISC include simpler instruction sets that can execute in one clock cycle, multiple general-purpose registers to reduce memory access, and register-based operations. The Berkeley RISC is highlighted as one of the first RISC processors, using register windows to efficiently handle subroutine calls and parameter passing between nested routines.
The document discusses RISC (reduced instruction set computers) architectures compared to CISC (complex instruction set computers) architectures. Some key points:
- RISCs aim to simplify the instruction set to allow for faster execution, while CISCs include more complex instructions closer to high-level languages.
- Studies show programs spend most time on simple operations like moves and branches, using simple addressing modes and local variables, informing the RISC approach.
- RISCs use load/store architectures, fixed-length instructions, delayed loading, and many registers to improve performance over CISCs.
- While RISCs have advantages in speed and simplicity, comparisons are complex and modern processors combine RIS
Reduced instruction set computing, or RISC (pronounced 'risk', /ÉšÉĒsk/), is a CPU design strategy based on the insight that a simplified instruction set provides higher performance when combined with a microprocessor architecture capable of executing those instructions using fewer microprocessor cycles per instruction.
This document describes formal verification of a pipelined CISC microprocessor modeled after the Intel IA32 instruction set using the UCLID term-level verifier. The objective was to understand UCLID's strengths and weaknesses for modeling hardware designs and the verification process. A pipelined Y86 processor implementation from a textbook was verified against its sequential reference model. The control logic was automatically translated to UCLID format. Modularity and automation were emphasized to maintain model fidelity during verification.
RISC (reduced instruction set computer)LokmanArman
Â
RISC
Reduced Instruction Set Computer
What Is RISC?
History Of RISC.
Characteristics Of RISC.
Five Design Principles Of RISC.
What Actually RISC Does?
In Real Life Uses Of RISC In Computer Architecture.
Computer Architecture & Organization.
This document discusses RISC processors and compares them to CISC processors. It covers the history of RISC, including the development of RISC concepts in the 1970s. The key differences between RISC and CISC are that RISC uses fixed-length instructions that perform in one clock cycle, while CISC has variable-length instructions that may take multiple cycles. The document also outlines RISC design principles like simple instructions, register-to-register operations, and large register sets. Examples of popular RISC architectures like MIPS, SPARC, and ARM are provided.
RISC and CISC architectures take different approaches to processing instructions. CISC uses complex, multi-step instructions that operate directly on memory, requiring less code but more processing time per instruction. RISC breaks instructions into simple, single-clock operations that emphasize registers, requiring more code but allowing for faster, more consistent execution through pipelining. While CISC aims to minimize instructions, RISC aims to minimize processing time per instruction through simplified hardware and software.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Verilog Code for 16bit RISC Processor, with ALU, Program Counter, Instruction Memory, Data Memory and Control Unit full codes
Visit www.Hellocodings.com
The document discusses RISC (reduced instruction set computers) architectures compared to CISC (complex instruction set computers) architectures. Some key points:
- RISCs aim to simplify the instruction set to allow for faster execution, while CISCs include more complex instructions closer to high-level languages.
- Studies show programs spend most time on simple operations like moves and branches, using simple addressing modes and local variables, informing the RISC approach.
- RISCs use load/store architectures, fixed-length instructions, delayed loading, and many registers to improve performance over CISCs.
- While RISCs have advantages in speed and simplicity, comparisons are complex and modern processors combine RIS
Reduced instruction set computing, or RISC (pronounced 'risk', /ÉšÉĒsk/), is a CPU design strategy based on the insight that a simplified instruction set provides higher performance when combined with a microprocessor architecture capable of executing those instructions using fewer microprocessor cycles per instruction.
This document describes formal verification of a pipelined CISC microprocessor modeled after the Intel IA32 instruction set using the UCLID term-level verifier. The objective was to understand UCLID's strengths and weaknesses for modeling hardware designs and the verification process. A pipelined Y86 processor implementation from a textbook was verified against its sequential reference model. The control logic was automatically translated to UCLID format. Modularity and automation were emphasized to maintain model fidelity during verification.
RISC (reduced instruction set computer)LokmanArman
Â
RISC
Reduced Instruction Set Computer
What Is RISC?
History Of RISC.
Characteristics Of RISC.
Five Design Principles Of RISC.
What Actually RISC Does?
In Real Life Uses Of RISC In Computer Architecture.
Computer Architecture & Organization.
This document discusses RISC processors and compares them to CISC processors. It covers the history of RISC, including the development of RISC concepts in the 1970s. The key differences between RISC and CISC are that RISC uses fixed-length instructions that perform in one clock cycle, while CISC has variable-length instructions that may take multiple cycles. The document also outlines RISC design principles like simple instructions, register-to-register operations, and large register sets. Examples of popular RISC architectures like MIPS, SPARC, and ARM are provided.
RISC and CISC architectures take different approaches to processing instructions. CISC uses complex, multi-step instructions that operate directly on memory, requiring less code but more processing time per instruction. RISC breaks instructions into simple, single-clock operations that emphasize registers, requiring more code but allowing for faster, more consistent execution through pipelining. While CISC aims to minimize instructions, RISC aims to minimize processing time per instruction through simplified hardware and software.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Verilog Code for 16bit RISC Processor, with ALU, Program Counter, Instruction Memory, Data Memory and Control Unit full codes
Visit www.Hellocodings.com
The document discusses the differences between RISC and CISC architectures. RISC architectures have lower complexity through hardwired logic and simpler instruction sets, which can result in higher performance. However, CISC architectures have more extensive instruction sets which allow hardware implementation of high-level functions. The trade-offs between hardware and software complexity must be considered, as well as industry factors, in evaluating RISC vs CISC. An optimal solution may be a hybrid architecture with a RISC core and CISC-like instructions.
Comparative Study of RISC AND CISC ArchitecturesEditor IJCATR
Â
Comparison between RISC and CISC in the language of computer architecture for research is not very simple because
a lot of researcher worked on RISC and CISC Architectures. Both these architecture differ substantially in terms of their underlying
platforms and hardware architectures. The type of chips used differs a lot and there exists too many variants as well. This paper
gives us the architectural comparison between RISC and CISC architectures. Also, we provide their advantages performance point
of view and share our idea to the new researchers.
RISC and CISC architectures evolved from different philosophies for optimizing computer performance given the constraints of early computing technologies. CISC emphasized complex instructions and efficient memory usage while RISC focused on simple instructions and fast execution. Over time, improvements blurred the lines as CISC adopted RISC techniques like pipelining and RISC grew more complex instructions. Modern processors integrate aspects of both to optimize performance for current software and hardware.
RISC - Reduced Instruction Set ComputingTushar Swami
Â
This document discusses RISC (Reduced Instruction Set Computer) architecture. It includes a member list, outline of topics to be covered, and acknowledgements. The main topics covered are what RISC is, the background and history of RISC, characteristics of RISC like simplified instructions and pipelining, differences between RISC and CISC, performance equations, and applications of RISC like in mobile systems, high-end computing, and ARM and MIPS architectures. It concludes that over time, the differences between RISC and CISC have blurred as they have adopted each other's strategies.
This document compares RISC and CISC processor architectures. It discusses that CISC processors have more complex instructions that can perform multiple operations, while RISC processors have simpler instructions that are optimized to complete in one clock cycle. CISC was developed earlier when memory was expensive, to reduce the number of instructions, while RISC focuses on increasing processor speed. RISC has advantages of faster execution and simpler hardware design, while CISC allows for more compact code.
Introducing Embedded Systems and the MicrocontrollersRavikumar Tiwari
Â
This document provides an overview of embedded systems and microcontrollers. It defines embedded systems as systems with dedicated software embedded in computer hardware for a specific purpose. An embedded system typically contains a microprocessor, memory, I/O components, and application software stored in ROM. It also includes an RTOS that manages tasks and resources. The document discusses RISC architecture and how pipelining improves processor efficiency by splitting instructions into stages that can run simultaneously.
This document discusses instruction-level parallelism (ILP) limitations. It covers ILP background using a MIPS example, hardware models that were studied including register renaming and branch/jump prediction assumptions. A study of ILP limitations found diminishing returns with larger window sizes and realizable processors are limited by complexity and power constraints. Simultaneous multithreading was explored as a technique to improve ILP but has its own design challenges. Today, x86 and ARM processors employ various ILP optimizations within pipeline constraints.
The document discusses the differences between CISC and RISC instruction set architectures. CISC aims to reduce storage usage and support compatibility, while making compilation easier and allowing complex assembly programming. RISC philosophy is to execute one instruction per clock cycle, use fixed-size instructions, only allow load/store instructions to access memory, and support high-level languages. While CISC was originally more common, the conclusion is that RISC provides more design simplicity, better pipelining, and is now preferable given cheaper memory.
Translating GPU Binaries to Tiered SIMD Architectures with OcelotSlide_N
Â
The document discusses Ocelot, a binary translation framework that allows architectures other than NVIDIA GPUs to execute programs written in PTX, an intermediate representation used by NVIDIA GPUs. It describes how Ocelot maps the PTX thread hierarchy to different architectures, uses translation techniques to hide memory latency, and emulates GPU data structures. It also provides details on the implementation of the translator and a case study of translating a PTX program to IBM Cell Processor assembly code.
The document discusses microprocessors, RISC, and CISC architectures. It provides the following key points:
1. A microprocessor, also known as the CPU, is the central processing unit of computers and electronic devices that contains components like transistors to carry out instructions.
2. RISC architectures aim to simplify instruction sets to maximize efficiency through pipelining, using simple addressing modes and instruction formats with complex operations as sequences of simple instructions.
3. CISC architectures contain large, complex instruction sets ranging from simple to specialized to make efficient use of memory and simplify compiler development by mapping directly to high-level languages.
RISC and CISC architectures have converged over time as processors have advanced. Originally, CISC emphasized complex instructions that could access memory directly, while RISC used simpler instructions but more registers. Now, CISC chips employ techniques like pipelining and multi-instruction execution. Meanwhile, RISC chips have more complex instructions and hardware. The differences have blurred as both styles adopt each other's strategies using newer technologies.
The document discusses several machine learning projects at NECST Research. It summarizes projects involving behavior identification in animals using models like XGBoost, muscle synergy identification using NMF and neural networks on FPGA, deep learning acceleration on embedded devices using HLS, spiking neural networks for robot simulation, CNN acceleration on FPGA using CONDOR, and the PRETZEL system for optimizing multiple similar ML models deployed on cloud platforms.
The document discusses parallelism and techniques to improve computer performance through parallel execution. It describes instruction level parallelism (ILP) where multiple instructions can be executed simultaneously through techniques like pipelining and superscalar processing. It also discusses processor level parallelism using multiple processors or processor cores to concurrently execute different tasks or threads.
Review paper on 32-BIT RISC processor with floating point arithmeticIRJET Journal
Â
This document reviews a proposed 32-bit RISC processor with floating point arithmetic. It discusses RISC and floating point concepts, reviews previous related work on RISC processor design, and proposes the design of a 32-bit RISC processor with the following key aspects:
- An instruction set with over 30 instructions in R-type, I-type, J-type, and I/O formats.
- A five-stage pipeline consisting of instruction fetch, decode, execution, memory/IO, and write-back stages.
- The inclusion of a floating point unit to support floating point arithmetic and avoid errors encountered in fixed point designs.
- Implementation in VHDL and
Advanced Scalable Decomposition Method with MPICH Environment for HPCIJSRD
Â
MPI (Message Passing Interface) has been effectively used in the high performance computing community for years and is the main programming model. The MPI is being widely used to developed parallel programs on computing system such as clusters. The major component of high performance computing (HPC) environment MPI is becoming increasingly prevalent. MPI implementations typically equate an MPI process with an OS-process, resulting in decomposition technique where MPI processes are bound to the physical cores. It integrated approach makes it possible to add more concurrency than available parallelism, while minimizing the overheads related to context switches, scheduling and synchronization. Fiber is used by it to support multiple MPI processes inside an operating system process. There are three widely used MPI libraries, including OPENMPI, MPICH2 and MVAPICH2. This paper works on the decomposition techniques and also integrates the MPI environment with using MPICH2.
DESIGN AND ANALYSIS OF A 32-BIT PIPELINED MIPS RISC PROCESSORVLSICS Design
Â
The document describes the design and analysis of a 32-bit pipelined MIPS RISC processor. A 6-stage pipeline is implemented, consisting of instruction fetch, instruction decode, register read, memory access, execute, and write back stages. Various techniques are used to optimize critical performance factors like power, frequency, area, and propagation delay. Power gating is applied to minimize power consumption, and deeper pipelining is used to increase speed. Simulation results show the pipeline consumes very low power of 0.129W, has a path delay of 11.180ns, and achieves a high frequency of 285.583MHz.
Dsdco IE: RISC and CISC architectures and design issuesHome
Â
RISC is an alternative to the Complex Instruction Set Computing (CISC) architecture and is often considered the most efficient CPU architecture technology available today.
This document provides an overview of CISC and RISC architectures. CISC uses complex instructions that perform multiple low-level operations in one instruction, taking multiple clock cycles to execute. It has few general purpose registers. RISC uses simpler instructions that are executed in one clock cycle. It has many general purpose registers and only load and store instructions can access memory directly. Pipelining allows RISC processors to process multiple instructions simultaneously for improved efficiency compared to CISC which executes instructions sequentially.
IRJET- Design of Low Power 32- Bit RISC Processor using Verilog HDLIRJET Journal
Â
This document describes the design and implementation of a 32-bit reduced instruction set computer (RISC) processor using Verilog HDL. Key aspects include:
1. The processor architecture consists of a control unit, datapath unit, and memory unit. The control unit uses a finite state machine to control the datapath.
2. The datapath contains subunits like register file, ALU, and memory interface that perform arithmetic and logic operations.
3. The processor follows a Harvard architecture with separate program and data memory. It uses a single instruction single data execution model.
4. Operation involves 5 stages - instruction fetch, decode, execute, memory access, and write back. The control unit generates signals to coordinate
A 64-Bit RISC Processor Design and Implementation Using VHDL Andrew Yoila
Â
1. Introduction
In today technology digital hardware plays a very important role in field of electronic and computer engineering products today. Due
to fast growing and competition in the technological world and rapid rise of transistor demand and speediness of joined circuits and
steeps declines of the price cause by the improvement in micro-electronics application Machineries. The introduction of computer to
the society has affected so many things in the society in which almost all problems can be solve using computers. Many industries
today are requesting for system developers that have the skills and technical knowhow of designing the program logics. VHDL is one
of the most popular design applications used by designer to implement such task. Reduce instruction set computing (RISC) processor
play a vital role with RISC AND BIST features which most dominants patterns can provide, in systems testing of the circuits below
the tests which is important to the quality component of testing [1]. Although the Reduced instruction set have few instructions sets, as
its bitâs processingâs sizes increase then the testâs patterns become denser and the structureâs faults is kept great. In view to enable the
Operation of the most instructions as registers to registers operation, Arithmetic logic unit is studied and a detail test patterns is being
develop. This report is prepaid keeping in mind where specific application is automated and controlled. This report has 33 instruction
set with MICA architecture. This report will focus mainly on the meaning of
i. RISC processor,
ii. the design,
iii. the architecture,
iv. the data part and the instruction set of the design.
v. VHDL.
Performance from Architecture: Comparing a RISC and a CISC with Similar Hardw...Dileep Bhandarkar
Â
This is the paper that Dave Patterson referred to in his Turing Lecture.
Performance comparisons across different computer architectures cannot usually separate the architectural contribution from various implementation and technology contributions to performance. This paper compares an example implementation from the RISC and CISC architectural schools
(a MIPS M/2000 and a Digital VAX 8700) on nine of the ten
SPEC benchmarks. The organizational similarity of these
machines provides an opportunity to examine the purely
architect ural advantages of RISC. The RISC approach offers,
compared with VAX, many fewer cycles per instruction but somewhat more instructions per program. Using results from a software monitor on the MIPS machine and a hardware monitor on the VAX, this paper shows that the esulting advantage in cycles per program ranges from slightly
under a factor of 2 to almost a factor of 4, with a geometric
mean of 2,7. It also demonstrates the correlation between
cycles per instruction and relative instruction count.
The document discusses the differences between RISC and CISC architectures. RISC architectures have lower complexity through hardwired logic and simpler instruction sets, which can result in higher performance. However, CISC architectures have more extensive instruction sets which allow hardware implementation of high-level functions. The trade-offs between hardware and software complexity must be considered, as well as industry factors, in evaluating RISC vs CISC. An optimal solution may be a hybrid architecture with a RISC core and CISC-like instructions.
Comparative Study of RISC AND CISC ArchitecturesEditor IJCATR
Â
Comparison between RISC and CISC in the language of computer architecture for research is not very simple because
a lot of researcher worked on RISC and CISC Architectures. Both these architecture differ substantially in terms of their underlying
platforms and hardware architectures. The type of chips used differs a lot and there exists too many variants as well. This paper
gives us the architectural comparison between RISC and CISC architectures. Also, we provide their advantages performance point
of view and share our idea to the new researchers.
RISC and CISC architectures evolved from different philosophies for optimizing computer performance given the constraints of early computing technologies. CISC emphasized complex instructions and efficient memory usage while RISC focused on simple instructions and fast execution. Over time, improvements blurred the lines as CISC adopted RISC techniques like pipelining and RISC grew more complex instructions. Modern processors integrate aspects of both to optimize performance for current software and hardware.
RISC - Reduced Instruction Set ComputingTushar Swami
Â
This document discusses RISC (Reduced Instruction Set Computer) architecture. It includes a member list, outline of topics to be covered, and acknowledgements. The main topics covered are what RISC is, the background and history of RISC, characteristics of RISC like simplified instructions and pipelining, differences between RISC and CISC, performance equations, and applications of RISC like in mobile systems, high-end computing, and ARM and MIPS architectures. It concludes that over time, the differences between RISC and CISC have blurred as they have adopted each other's strategies.
This document compares RISC and CISC processor architectures. It discusses that CISC processors have more complex instructions that can perform multiple operations, while RISC processors have simpler instructions that are optimized to complete in one clock cycle. CISC was developed earlier when memory was expensive, to reduce the number of instructions, while RISC focuses on increasing processor speed. RISC has advantages of faster execution and simpler hardware design, while CISC allows for more compact code.
Introducing Embedded Systems and the MicrocontrollersRavikumar Tiwari
Â
This document provides an overview of embedded systems and microcontrollers. It defines embedded systems as systems with dedicated software embedded in computer hardware for a specific purpose. An embedded system typically contains a microprocessor, memory, I/O components, and application software stored in ROM. It also includes an RTOS that manages tasks and resources. The document discusses RISC architecture and how pipelining improves processor efficiency by splitting instructions into stages that can run simultaneously.
This document discusses instruction-level parallelism (ILP) limitations. It covers ILP background using a MIPS example, hardware models that were studied including register renaming and branch/jump prediction assumptions. A study of ILP limitations found diminishing returns with larger window sizes and realizable processors are limited by complexity and power constraints. Simultaneous multithreading was explored as a technique to improve ILP but has its own design challenges. Today, x86 and ARM processors employ various ILP optimizations within pipeline constraints.
The document discusses the differences between CISC and RISC instruction set architectures. CISC aims to reduce storage usage and support compatibility, while making compilation easier and allowing complex assembly programming. RISC philosophy is to execute one instruction per clock cycle, use fixed-size instructions, only allow load/store instructions to access memory, and support high-level languages. While CISC was originally more common, the conclusion is that RISC provides more design simplicity, better pipelining, and is now preferable given cheaper memory.
Translating GPU Binaries to Tiered SIMD Architectures with OcelotSlide_N
Â
The document discusses Ocelot, a binary translation framework that allows architectures other than NVIDIA GPUs to execute programs written in PTX, an intermediate representation used by NVIDIA GPUs. It describes how Ocelot maps the PTX thread hierarchy to different architectures, uses translation techniques to hide memory latency, and emulates GPU data structures. It also provides details on the implementation of the translator and a case study of translating a PTX program to IBM Cell Processor assembly code.
The document discusses microprocessors, RISC, and CISC architectures. It provides the following key points:
1. A microprocessor, also known as the CPU, is the central processing unit of computers and electronic devices that contains components like transistors to carry out instructions.
2. RISC architectures aim to simplify instruction sets to maximize efficiency through pipelining, using simple addressing modes and instruction formats with complex operations as sequences of simple instructions.
3. CISC architectures contain large, complex instruction sets ranging from simple to specialized to make efficient use of memory and simplify compiler development by mapping directly to high-level languages.
RISC and CISC architectures have converged over time as processors have advanced. Originally, CISC emphasized complex instructions that could access memory directly, while RISC used simpler instructions but more registers. Now, CISC chips employ techniques like pipelining and multi-instruction execution. Meanwhile, RISC chips have more complex instructions and hardware. The differences have blurred as both styles adopt each other's strategies using newer technologies.
The document discusses several machine learning projects at NECST Research. It summarizes projects involving behavior identification in animals using models like XGBoost, muscle synergy identification using NMF and neural networks on FPGA, deep learning acceleration on embedded devices using HLS, spiking neural networks for robot simulation, CNN acceleration on FPGA using CONDOR, and the PRETZEL system for optimizing multiple similar ML models deployed on cloud platforms.
The document discusses parallelism and techniques to improve computer performance through parallel execution. It describes instruction level parallelism (ILP) where multiple instructions can be executed simultaneously through techniques like pipelining and superscalar processing. It also discusses processor level parallelism using multiple processors or processor cores to concurrently execute different tasks or threads.
Review paper on 32-BIT RISC processor with floating point arithmeticIRJET Journal
Â
This document reviews a proposed 32-bit RISC processor with floating point arithmetic. It discusses RISC and floating point concepts, reviews previous related work on RISC processor design, and proposes the design of a 32-bit RISC processor with the following key aspects:
- An instruction set with over 30 instructions in R-type, I-type, J-type, and I/O formats.
- A five-stage pipeline consisting of instruction fetch, decode, execution, memory/IO, and write-back stages.
- The inclusion of a floating point unit to support floating point arithmetic and avoid errors encountered in fixed point designs.
- Implementation in VHDL and
Advanced Scalable Decomposition Method with MPICH Environment for HPCIJSRD
Â
MPI (Message Passing Interface) has been effectively used in the high performance computing community for years and is the main programming model. The MPI is being widely used to developed parallel programs on computing system such as clusters. The major component of high performance computing (HPC) environment MPI is becoming increasingly prevalent. MPI implementations typically equate an MPI process with an OS-process, resulting in decomposition technique where MPI processes are bound to the physical cores. It integrated approach makes it possible to add more concurrency than available parallelism, while minimizing the overheads related to context switches, scheduling and synchronization. Fiber is used by it to support multiple MPI processes inside an operating system process. There are three widely used MPI libraries, including OPENMPI, MPICH2 and MVAPICH2. This paper works on the decomposition techniques and also integrates the MPI environment with using MPICH2.
DESIGN AND ANALYSIS OF A 32-BIT PIPELINED MIPS RISC PROCESSORVLSICS Design
Â
The document describes the design and analysis of a 32-bit pipelined MIPS RISC processor. A 6-stage pipeline is implemented, consisting of instruction fetch, instruction decode, register read, memory access, execute, and write back stages. Various techniques are used to optimize critical performance factors like power, frequency, area, and propagation delay. Power gating is applied to minimize power consumption, and deeper pipelining is used to increase speed. Simulation results show the pipeline consumes very low power of 0.129W, has a path delay of 11.180ns, and achieves a high frequency of 285.583MHz.
Dsdco IE: RISC and CISC architectures and design issuesHome
Â
RISC is an alternative to the Complex Instruction Set Computing (CISC) architecture and is often considered the most efficient CPU architecture technology available today.
This document provides an overview of CISC and RISC architectures. CISC uses complex instructions that perform multiple low-level operations in one instruction, taking multiple clock cycles to execute. It has few general purpose registers. RISC uses simpler instructions that are executed in one clock cycle. It has many general purpose registers and only load and store instructions can access memory directly. Pipelining allows RISC processors to process multiple instructions simultaneously for improved efficiency compared to CISC which executes instructions sequentially.
IRJET- Design of Low Power 32- Bit RISC Processor using Verilog HDLIRJET Journal
Â
This document describes the design and implementation of a 32-bit reduced instruction set computer (RISC) processor using Verilog HDL. Key aspects include:
1. The processor architecture consists of a control unit, datapath unit, and memory unit. The control unit uses a finite state machine to control the datapath.
2. The datapath contains subunits like register file, ALU, and memory interface that perform arithmetic and logic operations.
3. The processor follows a Harvard architecture with separate program and data memory. It uses a single instruction single data execution model.
4. Operation involves 5 stages - instruction fetch, decode, execute, memory access, and write back. The control unit generates signals to coordinate
A 64-Bit RISC Processor Design and Implementation Using VHDL Andrew Yoila
Â
1. Introduction
In today technology digital hardware plays a very important role in field of electronic and computer engineering products today. Due
to fast growing and competition in the technological world and rapid rise of transistor demand and speediness of joined circuits and
steeps declines of the price cause by the improvement in micro-electronics application Machineries. The introduction of computer to
the society has affected so many things in the society in which almost all problems can be solve using computers. Many industries
today are requesting for system developers that have the skills and technical knowhow of designing the program logics. VHDL is one
of the most popular design applications used by designer to implement such task. Reduce instruction set computing (RISC) processor
play a vital role with RISC AND BIST features which most dominants patterns can provide, in systems testing of the circuits below
the tests which is important to the quality component of testing [1]. Although the Reduced instruction set have few instructions sets, as
its bitâs processingâs sizes increase then the testâs patterns become denser and the structureâs faults is kept great. In view to enable the
Operation of the most instructions as registers to registers operation, Arithmetic logic unit is studied and a detail test patterns is being
develop. This report is prepaid keeping in mind where specific application is automated and controlled. This report has 33 instruction
set with MICA architecture. This report will focus mainly on the meaning of
i. RISC processor,
ii. the design,
iii. the architecture,
iv. the data part and the instruction set of the design.
v. VHDL.
Performance from Architecture: Comparing a RISC and a CISC with Similar Hardw...Dileep Bhandarkar
Â
This is the paper that Dave Patterson referred to in his Turing Lecture.
Performance comparisons across different computer architectures cannot usually separate the architectural contribution from various implementation and technology contributions to performance. This paper compares an example implementation from the RISC and CISC architectural schools
(a MIPS M/2000 and a Digital VAX 8700) on nine of the ten
SPEC benchmarks. The organizational similarity of these
machines provides an opportunity to examine the purely
architect ural advantages of RISC. The RISC approach offers,
compared with VAX, many fewer cycles per instruction but somewhat more instructions per program. Using results from a software monitor on the MIPS machine and a hardware monitor on the VAX, this paper shows that the esulting advantage in cycles per program ranges from slightly
under a factor of 2 to almost a factor of 4, with a geometric
mean of 2,7. It also demonstrates the correlation between
cycles per instruction and relative instruction count.
What is Microcontroller, Microcontroller vs Microprocessor, Development/Classication of microcontrollers, Harvard vs. Princeton Architecture, RISC AND CISC CONTROLLERS
Features of RISC, Microcontroller for Embedded Systems
10 x86 PC Embedded Applications, Choosing a Microcontroller
Criteria for Choosing a Microcontroller, Mechatronics, and Microcontrollers, A brief history of the PIC microcontroller, PIC Microcontrollers, Feature: PIC16F877, Simplied Features.
This document discusses the history of CISC and RISC architecture designs. In the 1980s, CISC and RISC architectures differed in their instruction complexity and design constraints for desktop and server computing. CISC used complex instruction sets while RISC used reduced instruction sets. Over time, both architectures evolved with improvements in compiler technology, memory costs, and chip design. Now, CISC is commonly used in desktops and servers while RISC is used in applications requiring high performance like real-time systems. The key differences between CISC and RISC relate to performance, pricing strategies, and design approaches to instructions and addressing modes.
The document discusses RISC and CISC architectures, Flynn's taxonomy of parallel processing, and alternative parallel processing approaches. It notes that RISC machines use explicit load and store instructions to access memory, while CISC machines use variable-length instructions. Flynn's taxonomy categorizes architectures based on the number of instruction and data streams. The document also describes superscalar, VLIW, vector processors, distributed computing, dataflow computing, neural networks, and systolic arrays as parallel processing approaches.
This document discusses RISC vs CISC architectures and the Harvard and von Neumann computer architectures. It provides examples of multiplying two numbers in memory using CISC and RISC approaches. CISC uses complex instructions that perform multiple operations, while RISC breaks operations into simpler instructions. Harvard architecture separates program and data memory while von Neumann uses shared memory.
RISC and CISC architectures evolved from different philosophies but have converged over time. CISC aimed to optimize for simpler compilers by incorporating complex instructions while RISC focused on optimized hardware using reduced, uniform instruction sets. While CISC was better for early computers with slow memory, RISC emerged to improve performance. Advances now blur the lines as CISC uses pipelining and RISC supports more instructions, showing how the strategies have influenced each other in modern processors.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
i) The document discusses RISC (reduced instruction set computer) and CISC (complex instruction set computer) architectures. CISC architectures, like Intel IA-32, use more complex instructions to closely support high-level languages, but this leads to variable instruction lengths and complexity. RISC architectures emerged in the 1980s with simpler instruction sets to address these issues.
ii) Key characteristics of RISC include one cycle execution time per instruction through optimization and pipelining, and a large number of registers to reduce memory interactions. Early non-RISC/CISC designs encouraged powerful instructions and addressing modes to optimize for small memories and slow memory access times.
CS304PC:Computer Organization and Architecture UNIT V_merged_merged.pdfAsst.prof M.Gokilavani
Â
This document discusses RISC and CISC processors. It defines RISC as having a reduced instruction set with simple instructions that each take one clock cycle. CISC has a more complex instruction set that can take multiple cycles. The document outlines the characteristics and advantages/disadvantages of both RISC and CISC. It also discusses parallel processing techniques like pipelining and vector processing that improve processor throughput.
This document discusses RISC and CISC architectures and how they have evolved. It explains that RISC aims to simplify instructions while CISC combines operations. Both seek to improve CPU performance but RISC reduces cycles per instruction while CISC minimizes instructions per program. It then covers characteristics of RISC like simple decoding and CISC like complex decoding. Pipelining is described as arranging hardware to simultaneously execute instructions to improve processor performance. The document ends by detailing the typical stages in a RISC processor's instruction pipeline.
A New Direction for Computer Architecture Researchdbpublications
Â
This paper we suggest a different computing environment as a worthy new direction for computer architecture research: personal mobile computing, where portable devices are used for visual computing and personal communications tasks. Such a device supports in an integrated fashion all the functions provided to-day by a portable computer, a cellular phone, a digital camera and a video game. The requirements placed on the processor in this environment are energy efficiency, high performance for multimedia and DSP functions, and area efficient, scalable designs. We examine the architectures that were recently pro-posed for billion transistor microprocessors. While they are very promising for the stationary desktop and server workloads, we discover that most of them are un-able to meet the challenges of the new environment and provide the necessary enhancements for multimedia applications running on portable devices.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
The document provides an introduction and overview of the Crusoe microprocessor developed by Transmeta Corp. Some key points:
- Crusoe uses a hybrid software/hardware approach, with software (Code Morphing) translating x86 binaries to native VLIW instructions at runtime.
- This decouples the x86 ISA from the underlying hardware, allowing the hardware to be simplified and changed without affecting software compatibility.
- The Crusoe processor uses a VLIW architecture with a 128-bit instruction word that can contain up to 4 "atoms" to be executed in parallel.
- Code Morphing software resides in ROM and acts as an emulator, caching translations in a translation cache for improved
Advanced computer architecture lesson 5 and 6Ismail Mukiibi
Â
The document discusses reduced instruction set computers (RISC) and compares them to complex instruction set computers (CISC). Key characteristics of RISC include simple, uniform instructions that are executed in one cycle; register-to-register operations with simple addressing modes; and a large number of registers to optimize register usage and minimize memory accesses. Studies show programs use simple operations, operands, and addressing modes most frequently, informing the RISC design which aims to efficiently support common cases through hard-wired, streamlined instructions.
The document provides an overview of RISC and CISC architectures. It discusses:
- RISC architectures utilize a small, highly optimized set of instructions (load/store architecture). Typical features include pipelining for one cycle execution, more registers, and simple addressing modes.
- CISC architectures have a more complex instruction set implemented through microcode. They support direct memory operations and fewer registers. Typical features include varying execution cycles and harder pipelining.
- Both aim to bridge the "semantic gap" between low-level hardware and high-level programming. RISC focuses on efficiency through simplified designs while CISC prioritizes compatibility through complex instructions.
The document provides an overview of the ARM Cortex-A8 processor, including its RISC-based superscalar design, 13-stage instruction pipeline, branch prediction features, and NEON SIMD capabilities. The Cortex-A8 achieves high performance while maintaining low power usage. It is widely used in mobile devices and consumer electronics due to its versatility and balance of performance and energy efficiency.
The document discusses computer architecture and organization. It provides questions and answers on topics such as:
- The definition of computer architecture and organization.
- The concept of layers in architectural design and their benefits.
- Differences between architecture and organization.
- Performance metrics and evaluating processor architecture.
- Examples of architectures like Pentium, servers, and the number of cycles for instructions on different processors.
This document compares and contrasts RISC and CISC processor architectures. It describes CISC as having complex instruction decoding logic to support multiple addressing modes, a small number of general purpose registers, and special purpose registers. RISC architectures are described as having a reduced instruction set with simple one-cycle instructions, large numbers of registers, and separate load and store instructions that operate only between registers and memory. The document outlines that while CISC was more efficient for early programming approaches, RISC has advantages as hardware and software technologies advanced.
This document discusses the design of a 32-bit MIPS RISC processor using VHDL. It begins by introducing the MIPS instruction set and architecture. It then reviews related work on designing MIPS processors using VHDL. The proposed work will implement a 32-bit MIPS processor with three instruction formats (R, I, and J types) and a 5-stage pipeline (fetch, decode, execute, memory, writeback). It concludes that modeling the processor in VHDL allows formal verification and that a 32-bit MIPS RISC processor could achieve high speed if implemented on an FPGA.
This slide is special for master students (MIBS & MIFB) in UUM. Also useful for readers who are interested in the topic of contemporary Islamic banking.
How to Fix the Import Error in the Odoo 17Celine George
Â
An import error occurs when a program fails to import a module or library, disrupting its execution. In languages like Python, this issue arises when the specified module cannot be found or accessed, hindering the program's functionality. Resolving import errors is crucial for maintaining smooth software operation and uninterrupted development processes.
A workshop hosted by the South African Journal of Science aimed at postgraduate students and early career researchers with little or no experience in writing and publishing journal articles.
How to Make a Field Mandatory in Odoo 17Celine George
Â
In Odoo, making a field required can be done through both Python code and XML views. When you set the required attribute to True in Python code, it makes the field required across all views where it's used. Conversely, when you set the required attribute in XML views, it makes the field required only in the context of that particular view.
ā¤šā¤ŋā¤ā¤ĻāĨ ā¤ĩā¤°āĨā¤Ŗā¤Žā¤žā¤˛ā¤ž ā¤ĒāĨā¤ĒāĨā¤āĨ, hindi alphabet PPT presentation, hindi varnamala PPT, Hindi Varnamala pdf, ā¤šā¤ŋā¤ā¤ĻāĨ ā¤¸āĨā¤ĩā¤°, ā¤šā¤ŋā¤ā¤ĻāĨ ā¤ĩāĨā¤¯ā¤ā¤ā¤¨, sikhiye hindi varnmala, dr. mulla adam ali, hindi language and literature, hindi alphabet with drawing, hindi alphabet pdf, hindi varnamala for childrens, hindi language, hindi varnamala practice for kids, https://www.drmullaadamali.com
The simplified electron and muon model, Oscillating Spacetime: The Foundation...RitikBhardwaj56
Â
Discover the Simplified Electron and Muon Model: A New Wave-Based Approach to Understanding Particles delves into a groundbreaking theory that presents electrons and muons as rotating soliton waves within oscillating spacetime. Geared towards students, researchers, and science buffs, this book breaks down complex ideas into simple explanations. It covers topics such as electron waves, temporal dynamics, and the implications of this model on particle physics. With clear illustrations and easy-to-follow explanations, readers will gain a new outlook on the universe's fundamental nature.
it describes the bony anatomy including the femoral head , acetabulum, labrum . also discusses the capsule , ligaments . muscle that act on the hip joint and the range of motion are outlined. factors affecting hip joint stability and weight transmission through the joint are summarized.
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...Dr. Vinod Kumar Kanvaria
Â
Exploiting Artificial Intelligence for Empowering Researchers and Faculty,
International FDP on Fundamentals of Research in Social Sciences
at Integral University, Lucknow, 06.06.2024
By Dr. Vinod Kumar Kanvaria
This presentation was provided by Steph Pollock of The American Psychological Associationâs Journals Program, and Damita Snow, of The American Society of Civil Engineers (ASCE), for the initial session of NISO's 2024 Training Series "DEIA in the Scholarly Landscape." Session One: 'Setting Expectations: a DEIA Primer,' was held June 6, 2024.
Walmart Business+ and Spark Good for Nonprofits.pdfTechSoup
Â
"Learn about all the ways Walmart supports nonprofit organizations.
You will hear from Liz Willett, the Head of Nonprofits, and hear about what Walmart is doing to help nonprofits, including Walmart Business and Spark Good. Walmart Business+ is a new offer for nonprofits that offers discounts and also streamlines nonprofits order and expense tracking, saving time and money.
The webinar may also give some examples on how nonprofits can best leverage Walmart Business+.
The event will cover the following::
Walmart Business + (https://business.walmart.com/plus) is a new shopping experience for nonprofits, schools, and local business customers that connects an exclusive online shopping experience to stores. Benefits include free delivery and shipping, a 'Spend Analyticsâ feature, special discounts, deals and tax-exempt shopping.
Special TechSoup offer for a free 180 days membership, and up to $150 in discounts on eligible orders.
Spark Good (walmart.com/sparkgood) is a charitable platform that enables nonprofits to receive donations directly from customers and associates.
Answers about how you can do more with Walmart!"
1. An Introduction to RISC
Processors
We are going to describe how microprocessor manufacturers took a new look at
processor architectures in the 1980s and started designing simpler but faster
processors. We begin by explaining why chip designers turned their backs on the
conventional complex instruction set computer (CISC) such at the 68K and the Intel
86X families and started producingreduced instruction set computers (RISCs) such as
MIPS and the PowerPC. RISC processors have simpler instruction sets than CISC
processors (although this is a rather crude distinction between these families, as we
shall soon see).
By the mid 90s many of these so-called RISC processors were considerably more
complex than some of the CISCs they replaced. That isn't a paradox. The RISC
processor isn't really a cut-down computer architectureâit represents a new approach
to architecture design. In fact, the distinction between CISC and RISC is now so
blurred that virtually all processors now have both RISC and CISC features.
The RISC Revolution
Before we look at the ARM, we describe the history and characteristics of RISC
architecture. From the introduction of the microprocessor in the 1970s to the mid
1980's there seems to have been an almost unbroken trend towards more and more
complex (you might even say Baroque) architectures. Some of these architectures
developed rather like a snowball rolling downhill. Each advance in chip fabrication
technology allowed designers to add more and more layers to the microprocessor's
central core. Intel's 8086 family illustrates this trend particularly well, because Intel
took their original 16-bit processor and added more features in each successive
generation. This approach to chip design leads to cumbersome architectures and
inefficient instruction sets, but it has the tremendous commercial advantage that end
users don't have to pay for new software when they buy the latest reincarnation of a
microprocessor.
A reaction against the trend toward greater architectural complexity began at IBM
with their 801 architecture and continued at Berkeley where Patterson and Ditzel
coined the term RISC to describe a new class of architectures that reversed earlier
trends in microcomputer design. According to popular wisdom RISC architectures are
streamlined versions of traditional complex instruction set computers. This notion is
2. both misleading and dangerous, because it implies that RISC processors are in some
way cruder versions of existing architectures. In brief, RISC architectures re-deploy to
better effect some of the silicon real estate used to implement complex instructions
and elaborate addressing modes in conventional microprocessors of the 68000 and
8086 generation. The mnemonic "RISC" should really stand for regular instruction set
computer.
Two factors influencing the architecture of first- and second-generation
microprocessors were microprogramming and the desire to help compiler writers by
providing ever more complex instruction sets. The latter is called closing the semantic
gap (i.e., reducing the difference between high-level and low-level languages). By
complex instructions we mean instruction like MOVE 12(A3,D0),D2 and ADD (A6)-
,D3 that carry out multi-step operations in a single machine-level instruction. The
MOVE 12(A3,D0),D2 generates an effective address by adding the contents of A3 to
the contents of D0 plus the literal 12. The resulting address is used to access the
source operand that is loaded into register D2.
Microprogramming achieved its highpoint in the 1970s when ferrite core memory had
a long access time of 1 ms or more and semiconductor high-speed random access
memory was very expensive. Quite naturally, computer designers used the slow main
store to hold the complex instructions that made up the machine-level program. These
machine-level instructions are interpreted by microcode in the much faster
microprogram control store within the CPU. Today, main stores use semiconductor
memory with an access time of 50 ns or less, and most of the advantages of
microprogramming have evaporated. Indeed, the goal of a RISC architecture is to
execute an instruction in a single machine cycle. A corollary of this statement is that
complex instructions can't be executed by RISC architectures. Before we look at RISC
architectures, we have to describe some of the research that led to the search for better
architectures.
Instruction Usage
Computer scientists carried out extensive research over a decade or more in the late
1970s into the way in which computers execute programs. Their studies demonstrated
that the relative frequency with which different classes of instructions are executed is
not uniform and that some types of instruction are executed far more frequently than
other types. Fairclough divided machine-level instructions into eight groups according
to type and compiled the statistics shown in Table 1. The "mean value of instruction
use" gives the percentage of times that instructions in that group are executed
averaged over both program types and computer architecture. These figures relate to
early 8-bit processors.
3. Table 1 Instruction usage as a function of instruction type
Instruction Group 1 2 3 4 5 6 7 8
Mean value of instruction use 45.28 28.73 10.75 5.92 3.91 2.93 2.05 0.4
These eight instruction groups in table 1 are:
Data movement
Program flow control (i.e., branch, call, return)
Arithmetic
Compare
Logical
Shift
Bit manipulation
Input/output and miscellaneous
Table 1 convincingly demonstrates that the most common instruction type is the data
movement primitive of the form P: = Q in a high-level language or MOVE P,Q in a
low-level language. Similarly, the program flow control group that includes both
conditional and unconditional branches (together with subroutine calls and returns)
forms the second most common group of instructions. Taken together, the data
movement and program flow control groups account for 74% of all instructions. A
corollary of this statement is that we can expect a large program to contain only 26%
of instructions that are not data movement or program flow control primitives.
An inescapable inference from such results is that processor designers might be better
employed devoting their time to optimizing the way in which machines handle
instructions in groups one and two, than in seeking new powerful instructions that are
seldom used. In the early days of the microprocessor, chip manufacturers went out of
their way to provide special instructions that were unique to their products. These
instructions were then heavily promoted by the company's sales force. Today, we can
see that their efforts should have been directed towards the goal of optimizing the
most frequently used instructions. RISC architectures have been designed to exploit
the programming environment in which most instructions are data movement or
program control instructions.
Another aspect of computer architecture that was investigated was the optimum size
of literal operands (i.e., constants). Tanenbaum reported the remarkable result that
56% of all constant values lie in the range -15 to +15 and that 98% of all constants lie
in the range -511 to +511. Consequently, the inclusion of a 5-bit constant field in an
instruction would cover over half the occurrences of a literal. RISC architectures have
4. sufficiently long instruction lengths to include a literal field as part of the instruction
that caters for the majority of literals.
Programs use subroutines heavily, and an effective architecture should optimize the
way in which subroutines are called, parameters passed to and from subroutines, and
workspace allocated to local variables created by subroutines. Research showed that
in 95% of cases twelve words of storage are sufficient for parameter passing and local
storage. A computer with twelve registers should be able to handle all the operands
required by most subroutines without accessing main store. Such an arrangement
would reduces the processor-memory bus traffic associated with subroutine calls.
Characteristics of RISC Architectures
Having described the ingredients that go into an efficient architecture, we now look at
the attributes of first generation RISCs before covering RISC architectures in more
detail. The characteristics of an efficient RISC architecture are:
RISC processors have sufficient on-chip registers to overcome the worst effects of the
processor-memory bottleneck. Registers can be accessed more rapidly than off-chip
main store. Although today's processors rely heavily on fast on-chip cache memory to
increase throughput, registers still offer the highest performance.
RISC processors have three-address, register-to-register architectures with instructions
in the form OPERATION Ra,Rb,Rc, where Ra, Rb, and Rc are general-purpose
registers.
Because subroutines calls are so frequently executed, (some) RISC architectures make
provision for the efficient passing of parameters between subroutines.
Instructions that modify the flow of control (e.g., branch instructions) are
implemented efficiently because they comprise about 20 to 30% of a typical program.
RISC processors aim to execute one instruction per clock cycle. This goal imposes a
limit on the maximum complexity of instructions.
RISC processors don't attempt to implement infrequently used instructions. Complex
instructions waste silicon real-estate and conflict with the requirements of point 8.
Moreover, the inclusion of complex instructions increases the time taken to design,
fabricate and test a processor.
5. A corollary of point 5 is that an efficient architecture should not be
microprogrammed, because microprogramming interprets a machine-level instruction
by executing microinstructions. In the limit, a RISC processor is close to a
microprogrammed architecture in which the distinction between machine cycle and
microcode has vanished.
An efficient processor should have a single instruction format (or at least very few
formats). A typical CISC processor such as the 68000 has variable-length instructions
(e.g., from 2 to 10 bytes). By providing a single instruction format, the decoding of a
RISC instruction into its component fields can be performed by a minimum level of
decoding logic. It follows that a RISC's instruction length should be sufficient to
accommodate the operation code field and one or more operand fields. Consequently,
a RISC processor may not utilize memory space as efficiently as does a conventional
CISC microprocessor.
Two fundamental aspects of the RISC architecture that we cover later are its register
set and the use of pipelining. Multiple overlapping register windows were
implemented by the Berkeley RISC to reduce the overhead incurred by transferring
parameters between subroutines. Pipelining is a mechanism that permits the
overlapping of instruction execution (i.e., internal operations are carried out in
parallel). Many of the features of RISC processors are not new, and have been
employed long before the advent of the microprocessor. The RISC revolution
happened when all these performance-enhancing techniques were brought together
and applied to microprocessor design.
The Berkeley RISC
Although many CISC processors were designed by semiconductor manufacturers, one
of the first RISC processors came from the University of California at Berkeley. The
Berkeley RISC wasn't a commercial machine, although it had a tremendous impact on
the development of later RISC architectures. Figure 1 describes the format of a
Berkeley RISC instruction. Each of the 5-bit operand fields (Destination, Source 1,
Source 2) permits one of 32 internal registers to be accessed.
Figure 1 Format of the Berkeley RISC instruction
6. The single-bit set condition code field, Scc, determines whether the condition code
bits are updated after the execution of an instruction. The 14-bit Source 2 field has
two functions. If the IM bit (immediate) is 0, the Source 2 field specifies one of 32
registers. If the IM bit is 1, the Source 2 field provide a 13-bit literal operand.
Since five bits are allocated to each operand field, it follows that this RISC has 25 =
32 internal registers. This last statement is emphatically not true, since the Berkeley
RISC has 138 user-accessible general-purpose internal registers. The reason for the
discrepancy between the number of registers directly addressable and the actual
number of registers is due to a mechanism called windowing that gives the
programmer a view of only a subset of all registers at any instant. Register R0 is
hardwired to contain the constant zero. Specifying R0 as an operand is the same as
specifying the constant 0.
Register Windows
An important feature of the Berkeley RISC architecture is the way in which it
allocates new registers to subroutines; that is, when you call a subroutine, you get
some new registers. If you can create 12 registers out of thin air when you call a
subroutine, each subroutine will have its own workspace for temporary variables,
thereby avoiding relatively slow accesses to main store.
7. Although only 12 or so registers are required by each invocation of a subroutine, the
successive nesting of subroutines rapidly increases the total number of on-chip
registers assigned to subroutines. You might think that any attempt to dedicate a set of
registers to each new procedure is impractical, because the repeated calling of nested
subroutines will require an unlimited amount of storage. Subroutines can indeed be
nested to any depth, but research has demonstrated that on average subroutines are not
nested to any great depth over short periods. Consequently, it is feasible to adopt a
modest number of local register sets for a sequence of nested subroutines.
Figure 2 provides a graphical representation of the execution of a typical program in
terms of the depth of nesting of subroutines as a function of time. The trace goes up
each time a subroutine is called and down each time a return is made. If subroutines
were never called, the trace would be a horizontal line. This figure demonstrates is
that even though subroutines may be nested to considerable depths, there are periods
or runs of subroutine calls and returns that do not require a nesting level of greater
than about five.
Figure 2 Depth of subroutine nesting as a function of time
8. A mechanism for implementing local variable work space for subroutines adopted by
the designers of the Berkeley RISC is to support up to eight nested subroutines by
providing on-chip work space for each subroutine. Any further nesting forces the CPU
to dump registers to main memory, as we shall soon see.
Memory space used by subroutines can be divided into four types:
Global space Global space is directly accessible by all subroutines and holds
constants and data that may be required from any point within the program. Most
conventional microprocessors have only global registers.
9. Local space Local space is private to the subroutine. That is, no other subroutine can
access the current subroutine's local address space from outside the subroutine. Local
space is employed as working space by the current subroutine.
Imported parameter space Imported parameter space holds the parameters imported
by the current subroutine from its parent that called it. In Berkeley RISC terminology
these are called the high registers.
Exported parameter space Exported parameter space holds the parameters exported
by the current subroutine to its child. In RISC terminology these are called the low
registers.
Windows and Parameter Passing
One of the reasons for the high frequency of data movement operations is the need to
pass parameters to subroutines and to receive them from subroutines.
The Berkeley RISC architecture deals with parameter passing by means of multiple
overlapped windows. A window is the set of registers visible to the current subroutine.
Figure 3 illustrates the structure of the Berkeley RISC's overlapping windows. Only
three consecutive windows (i-1, i, i+1) of the 8 windows are shown in Figure 3. The
vertical columns represent the registers seen by the corresponding window. Each
window sees 32 registers, but they aren't all the same 32 registers.
The Berkeley RISC has a special-purpose register called the window pointer, WP, that
indicates the current active window. Suppose that the processor is currently using
the ith window set. In this case the WP contains the value i. The registers in each of
the 8 windows are divided into four groups shown in Table 2.
Table 2 Berkeley RISC register types
Register name Register type
R0 to R9 The global register set is always accessible.
R10 to R15 Six registers used by the subroutine to receive parameters from its parent an
parent.
R16 to R25 Ten local registers accessed only by the current subroutine that cannot be ac
subroutine.
R26 to R31 Six registers used by the subroutine to pass parameters to and from its own
called by itself).
10. All windows consist of 32 addressable registers, R0 to R31. A Berkeley RISC
instruction of the form ADD R3,R12,R25 implements [R25] [R3] + [R12], where
R3 lies within the window's global address space, R12 lies within its import from (or
export to) parent subroutine space, and R25 lies within its local address space. RISC
arithmetic and logical instructions always involve 32-bit values (there are no 8-bit or
16-bit operations).
The Berkeley RISC's subroutine call is CALL Rd,<address> and is similar to a
typical CISC instruction BSR <address>. Whenever a subroutine is invoked
by CALLR Rd,<address>, the contents of the window pointer are incremented by
1 and the current value of the program counter saved in register Rd of the new
window. The Berkeley RISC doesn't employ a conventional stack in external main
memory to save subroutine return addresses.
Figure 3 Berkeley windowed register sets
11.
12. Once a new window has been invoked (in Figure 3 this is window i), the new
subroutine sees a different set of registers to the previous window. Global registers R0
to R9 are an exception because they are common to all windows. Window R10 of the
child (i.e., called) subroutine corresponds to (i.e., is the same as) window R26 of the
calling (i.e., parent) subroutine. Suppose you wish to send a parameter to a subroutine.
If the parameter is in R10 and you call a subroutine, register R26 in this subroutine
will contain the parameter. There hasn't been a physical transfer of data because
register R26 in the current window is simply register R10 in the previous window.
Figure 4 Relationship between register number, window number, and register address
13.
14.
15. The physical arrangement of the Berkeley RISC's window system is given in Figure 4.
On the left hand side of the diagram is the actual register array that holds all the on-
chip general-purpose registers. The eight columns associated with windows 0 to 7
demonstrate how each window is mapped onto the physical memory array on the chip
and how the overlapping regions are organized. The windows are logically arranged
in a circular fashion so that window 0 follows window 7 and window 7 precedes
window 0. For example, if the current window pointer is 3 and you access register
R25, location 74 is accessed in the register file. However, if you access register R25
when the window pointer is 7, you access location 137.
The total number of physical registers required to implement the Berkeley windowed
register set is:
10 global + 8 x 10 local + 8 x 6 parameter transfer registers = 138 registers.
Window Overflow
Unfortunately, the total quantity of on-chip resources of any processor is finite and, in
the case of the Berkeley RISC, the registers are limited to 8 windows. If subroutines
are nested to a depth greater than or equal to 7, window overflow is said to occur, as
there is no longer a new window for the next subroutine invocation. When an
overflow takes place, the only thing left to do is to employ external memory to hold
the overflow data. In practice the oldest window is saved rather than the new window
created by the subroutine just called.
If the number of subroutine returns minus the number of subroutine calls exceeds 8,
window underflow takes place. Window underflow is the converse of window
overflow and the youngest window saved in main store must be returned to a window.
A considerable amount of research was carried out into dealing with window overflow
efficiently. However, the imaginative use of windowed register sets in the Berkeley
RISC was not adopted by many of the later RISC architectures. Modern RISC
generally have a single set of 32 general-purpose registers.
RISC Architecture and Pipelining
We now describe pipelining, one of the most important techniques for increasing the
throughput of a digital system that uses the regular structure of a RISC to carry out
internal operations in parallel.
16. Figure 5 illustrates the machine cycle of a hypothetical microprocessor executing an
ADD P instruction (i.e., [A] [R] + [M(P)], where A is an on-chip general purpose
register and P is a memory location. The instruction is executed in five phases:
Instruction fetch Read the instruction from the system memory and increment the
program counter.
Instruction decode Decode the instruction read from memory during the previous
phase. The nature of the instruction decode phase is dependent on the complexity of
the instruction encoding. A regularly encoded instruction might be decoded in a few
nanoseconds with two levels of gating whereas a complex instruction format might
require ROM-based look-up tables to implement the decoding.
Operand fetch The operand specified by the instruction is read from the system
memory or an on-chip register and loaded into the CPU.
Execute The operation specified by the instruction is carried out.
Operand store The result obtained during the execution phase is written into the
operand destination. This may be an on-chip register or a location in external memory.
Figure 5 Instruction Execution
Each of these five phases may take a specific time (although the time taken would
normally be an integer multiple of the system's master clock period). Some
instructions require less than five phases; for example, CMP R1,R2 compares R1 and
R2 by subtracting R1 from R2 to set the condition codes and does not need an operand
store phase.
The inefficiency in the arrangement of Figure 5 is immediately apparent. Consider
the execution phase of instruction interpretation. This phase might take one fifth of an
instruction cycle leaving the instruction execution unit idle for the remaining 80% of
the time. The same rule applies to the other functional units of the processor, which
also lie idle for 80% of the time. A technique called instruction pipelining can be
employed to increase the effective speed of the processor by overlapping in time the
17. various stages in the execution of an instruction. In the simplest of terms, a pipelined
processor executes instruction i while fetching instruction i + 1 at the same time.
The way in which a RISC processor implements pipelining is described in Figure 6.
The RISC processor executes the instruction in four steps or phases: instruction fetch
from external memory, operand fetch, execute, and operand store (we're using a 4-
stage system because a separate "instruction decode" phase isn't normally necessary).
The internal phases take approximately the same time as the instruction fetch, because
these operations take place within the CPU itself and operands are fetched from and
stored in the CPU's own register file. Instruction 1 in Figure 6 begins in time slot 1
and is completed at the end of time slot 4.
Figure 6 Pipelining and instruction overlap
18. In a non-pipelined processor, the next instruction doesn't begin until the current
instruction has been completed. In the pipelined system of Figure 6, the instruction
fetch phase of instruction 2 begins in time slot 2, at the same time that the operand is
being fetched for instruction 1. In time slot 3, different phases of instructions 1, 2, and
3 are being executed simultaneously. In time slot 4, all functional units of the system
are operating in parallel and an instruction is completed in every time slot thereafter.
An n-stage pipeline can increase throughput by up to a factor of n.
Pipeline Bubbles
A pipeline is an ordered structure that thrives on regularity. At any stage in the
execution of a program, a pipeline contains components of two or more instructions at
varying stages in their execution. Consider Figure 7 in which a sequence of
instructions is being executed in a 4-stage pipelined processor. When the processor
encounters a branchinstruction, the following instruction is no longer found at the
next sequential address but at the target address in the branch instruction. The
processor is forced to reload its program counter with the value provided by the
branch instruction. This means that all the useful work performed by the pipeline must
now be thrown away, since the instructions immediately following the branch are not
going to be executed.
When information in a pipeline is rejected or the pipeline is held up by the
introduction of idle states, we say that a bubble has been introduced.
Figure 7 The pipeline bubble caused by a branch
19. As we have already stated, program control instructions are very frequent.
Consequently, any realistic processor using pipelining must do something to
overcome the problem of bubbles caused by instructions that modify the flow of
control (branch, subroutine call and return). The Berkeley RISC reduces the effect of
bubbles by refusing to throw away the instruction following a branch. This
mechanism is called a delayed jump or a branch-and-execute technique because the
instruction immediately after a branch is always executed. Consider the effect of the
following sequence of instructions:
ADD R1,R2,R3 [R3] [R1] + [R2]
JMPX N [PC] [N] Goto address N
ADD R2,R4,R5 [R5] [R2] + [R4] This is executed
ADD R7,R8,R9 Not executed because the branch is taken
The processor calculates R5 := R2 + R4 before executing the branch. This sequence of
instructions is most strange to the eyes of a conventional assembly language
programmer, who is not accustomed to seeing an instruction executed after a branch
has been taken.
20. Unfortunately, it's not always possible to arrange a program in such a way as to
include a useful instruction immediately after a branch. Whenever this happens, the
compiler must introduce a no operation instruction, NOP, after the branch and accept
the inevitability of a bubble. Figure 8 demonstrates how a RISC processor implements
a delayed jump. The branch described in Figure 8 is a computed branch whose target
address is calculated during the execute phase of the instruction cycle.
Figure 8 Delayed branch
Another problem caused by pipelining is data dependency in which certain sequences
of instructions run into trouble because the current operation requires a result from the
previous operation and the previous operation has not yet left the pipeline. Figure 9
demonstrates how data dependency occurs.
Figure 9 Data dependency
21. Suppose a programmer wishes to carry out the apparently harmless calculation
X := (A + B)AND(A + B - C).
Assuming that A, B, C, X, and two temporary values, T1 and T2, are in registers in
the current window, we can write:
ADD A,B,T1 [T1] [A] + [B]
SUB T1,C,T2 [T2] [T1] - [C]
AND T1,T2,X [X] [T1] īŋŊ [T2]
Instruction i + 1 in Figure 9 begins execution during the operand fetch phase of the
previous instruction. However, instruction i + 1 cannot continue on to its operand
fetch phase, because the very operand it requires does not get written back to the
register file for another two clock cycles. Consequently a bubble must be introduced
in the pipeline while instruction i + 1 waits for its data. In a similar fashion, the logical
AND operation also introduces a bubble as it too requires the result of a previous
operation which is in the pipeline.
Figure 10 demonstrates a technique called internal forwarding designed to overcome
the effects of data dependency. The following sequence of operations is to be
executed.
ADD
1. [R3] [R1] + [R2]
R1,R2,R3
ADD
2. [R6] [R4] + [R5]
R4,R5,R6
ADD
3. [R7] [R3] + [R4]
R3,R4,R7
ADD
4. [R8] [R7] + [R1]
R7,R1,R8
Figure 10 Internal forwarding
22. In this example, instruction 3 (i.e., ADD R3,R4,R7) uses an operand generated by
instruction 1 (i.e., the contents of register R3). Because of the intervening instruction
2, the destination operand generated by instruction 1 has time to be written into the
register file before it is read as a source operand by instruction 3.
Instruction 3 generates a destination operand R7 that is required as a source operand
by the next instruction. If the processor were to read the source operand requested by
instruction 4 from the register file, it would see the old value of R7. By means of
internal forwarding the processor transfers R7 from instruction 3's execution unit
directly to the execution unit of instruction 4 (see Figure 10).
Accessing External Memory in RISC
Systems
Conventional CISC processors have a wealth of addressing modes that are used in
conjunction with memory reference instructions. For example, the 68020 implements
ADD D0,-(A5) which adds the contents of D0 to the top of the stack pointed at by A5
and then pushes the result on to this stack.
In their ruthless pursuit of efficiency, the designers of the Berkeley RISC severely
restricted the way in which it accesses external memory. The Berkeley RISC permits
only two types of reference to external memory: a load and a store. All arithmetic and
23. logical operations carried out by the RISC apply only to source and destination
operands in registers. Similarly, the Berkeley RISC provides a limited number of
addressing modes with which to access an operand in the main store. It's not hard to
find the reason for these restrictions on external memory accessesâan external
memory reference takes longer than an internal operation. We now discuss some of
the general principles of Berkeley RISC load and store instructions.
Consider the load register operation of the form LDXW (Rx)S2,Rd that has the effect
[Rd] [M([Rx] + S2)]. The operand address is the contents of the memory location
pointed at by register Rx plus offset S2. Figure 11 demonstrates the sequence of
actions performed during the execution of this instruction. During the source fetch
phase, register Rx is read from the register file and used to calculate the effective
address of the operand in the execute phase. However, the processor can't progress
beyond the execute phase to the store operand phase, because the operand hasn't been
read from the main store. Therefore the main store must be accessed to read the
operand and a store operand phase executed to load the operand into destination
register Rd. Because memory accesses introduce bubbles into the pipeline, they are
avoided wherever possible.
Figure 11 The load operation
24. The Berkeley RISC implements two basic addressing modes: indexed and program
counter relative. All other addressing modes can (and must) be synthesized from these
two primitives. The effective address in the indexed mode is given by:
EA = [Rx] + S2
where Rx is the index register (one of the 32 general purpose registers accessible by
the current subroutine) and S2 is an offset. The offset can be either a general-purpose
register or a 13-bit constant.
The effective address in the program counter relative mode is given by:
EA = [PC] + S2
where PC represents the contents of the program counter and S2 is an offset as above.
These addressing modes include quite a powerful toolbox: zero, one or two pointers
and a constant offset. If you wonder how we can use an addressing mode without an
index (i.e., pointer) register, remember that R0 in the global register set permanently
contains the constant 0. For example, LDXW (R12)R0,R3 uses simple address
register indirect addressing, whereas LDXW (R0)123,R3 uses absolute addressing
(i.e., memory location 123).
There's a difference between addressing modes permitted by load and store
operations. A load instruction permits the second source, S2, to be either an
immediate value or a second register, whereas a store instruction permits S2 to be a
13-bit immediate value only. This lack of symmetry between the load and store
addressing modes is because a "load base+index" instruction requires a register file
with two ports, whereas a "store base+index" instruction requires a register file
with three ports. Two-ported memory allows two simultaneous accesses. Three-ported
memory allows three simultaneous accesses and is harder to design.
Figure 1 defines just two basic Berkeley RISC instruction formats. The short
immediate format provides a 5-bit destination, a 5-bit source 1 operand and a 14-bit
short source 2 operand. The short immediate format has two variations: one that
specifies a 13-bit literal for source 2 and one that specifies a 5-bit source 2 register
address. Bit 13 specifies whether the source 2 operand is a 13-bit literal or a 5 bit
register pointer.
The long immediate format provides a 19-bit source operand by concatenating the two
source operand fields. Thirteen-bit and 19-bit immediate fields may sound a little
strange at first sight. However, since 13 + 19 = 32, the Berkeley RISC permits a full
25. 32-bit value to be loaded into a window register in two operations. In the next section
we will discover that the ARM processor deals with literals in a different way. A
typical CISC microprocessor might take the same number of instruction bits to
perform the same action (i.e., a 32-bit operation code field followed by a 32-bit
literal).
The following describes some of the addressing modes that can be synthesized from
the RISC's basic addressing modes.
1. Absolute addressing
EA = 13-bit offset
Implemented by setting Rx = R0 = 0, S2 = 13-bit constant.
2. Register indirect
EA = [Rx]
Implemented by setting S2 = R0 = 0.
3. Indexed addressing
EA = [Rx] + Offset
Implemented by setting S2 = 13-bit constant.
4. Two-dimensional byte addressing (i.e., byte array access)
EA = [Rx] + [Ry]
Implemented by setting S2 = [Ry].
This mode is available only for load instructions.
Conditional instructions (i.e., branch operations) do not require a destination address
and therefore the five bits, 19 to 23, normally used to specify a destination register are
used to specify the condition (one of 16 since bit 23 is not used by conditional
instructions).
Reducing the Branch Penalty
If we're going to reduce the effect of branches on the performance of RISC
processors, we need to determine the effect of branch instructions on the performance
of the system. Because we cannot know how many branches a given program will
contain, or how likely each branch is to be taken, we have to construct a probabilistic
model to describe the system's performance. We will make the following assumptions:
1. Each non-branch instruction is executed in one cycle
2. The probability that a given instruction is a branch is pb
26. 3. The probability that a branch instruction will be taken is pt
4. If a branch is taken, the additional penalty is b cycles
If a branch is not taken, there is no penalty
If pb is the probability that an instruction is a branch, 1 - pb is the probability that it is
not a branch
The average number of cycles executed during the execution of a program is the sum
of the cycles taken for non-branch instructions, plus the cycles taken by branch
instructions that are taken, plus the cycles taken by branch instructions that are not
taken. We can derive an expression for the average number of cycles per instruction
as:
Tave = (1 - pb)īŋŊ1 + pbīŋŊptīŋŊ (1 + b) + pbīŋŊ (1 - pt) īŋŊ1 = 1 + pbīŋŊptīŋŊb.
This expression, 1 + pbīŋŊptīŋŊb, tells us that the number of branch instructions, the
probability that a branch is taken, and the overhead per branch instruction all
contribute to the branch penalty. We are now going to examine some of the ways in
which the value of pbīŋŊptīŋŊb can be reduced.
Branch Prediction
If we can predict the outcome of the branch instruction before it is executed, we can
start filling the pipeline with instructions from the branch target address (assuming the
branch is going to be taken). For example, if the instruction is BRA N, the processor
can start fetching instructions at locations N, N + 1, N + 2 etc., as soon as the branch
instruction is fetched from memory. In this way, the pipeline is always filled with
useful instructions.
This prediction mechanism works well with an unconditional branch like BRA N.
Unfortunately, conditional branches pose a problem. Consider a conditional branch of
the form BCC N (branch to N on carry bit clear). Should the RISC processor make the
assumption that the branch will not be taken and fetch instructions in sequence, or
should it make the assumption that the branch will be taken and fetch instruction at
the branch target address N?
As we have already said, conditional branches are required to implement various
types of high-level language construct. Consider the following fragment of high-level
language code.
if (J < K) I = I + L;
(for T = 1; T <= I; T++)
27. {
.
.
}
The first conditional operation compares J with K. Only the nature of the problem will
tell us whether J is often less than K.
The second conditional in this fragment of code is provided by the FOR construct that
tests a counter at the end of the loop and then decides whether to jump back to the
body of the construct or to terminate to loop. In this case, you could bet that the loop
is more likely to be repeated than exited. Loops can be executed thousands of times
before they are exited. Some computers look at the type of conditional branch and
then either fill the pipeline from the branch target if you think that the branch will be
taken, or fill the pipeline from the instruction after the branch if you think that it will
not be taken.
If we attempt to predict the behavior of a system with two outcomes (branch taken or
branch not taken), there are four possibilities:
1. Predict branch taken and branch taken â successful outcome
2. Predict branch taken and branch not taken â unsuccessful outcome
3. Predict branch not taken and branch not taken â successful outcome
4. Predict branch not taken and branch taken â unsuccessful outcome
Suppose we apply a branch penalty to each of these four possible outcomes. The
penalty is the number of cycles taken by that particular outcome, as table 3
demonstrates. For example, if we think that a branch will not be taken and get
instructions following the branch and the branch is actually taken (forcing the pipeline
to be loaded with instructions at the target address), the branch penalty in table 3
is c cycles.
Table 3 The branch penalty
Prediction Result Branch penalty
Branch taken Branch taken a
Branch taken Branch not taken b
Branch not taken Branch taken c
Branch not taken Branch not taken d
We can now calculate the average penalty for a particular system. To do this we need
more information about the system. The first thing we need to know is the probability
28. that an instruction will be a branch (as opposed to any other category of instruction).
Assume that the probability that an instruction is a branch is pb. The next thing we
need to know is the probability that the branch instruction will be taken, pt. Finally,
we need to know the accuracy of the prediction. Let pc be the probability that a branch
prediction is correct. These values can be obtained by observing the performance of
real programs. Figure 12 illustrates all the possible outcomes of an instruction. We
can immediately write:
(1 - pb) = probability that an instruction is not a branch.
(1 - pt) = probability that a branch will not be taken.
(1 - pc) = probability that a prediction is incorrect.
These equations are obtained by using the principle that if one event or another must
take place, their probabilities must add up to unity. The average branch penalty per
branch instruction is therefore
Cave = a īŋŊ (pbranch_predicted_taken_and_taken) + b īŋŊ (pbranch_predicted_taken_but_not_taken)
+ c īŋŊ (pbranch_predicted_not_taken_but_taken) + d īŋŊ (pbranch_predicted_not_taken_and_not_taken)
Cave = a īŋŊ (pt īŋŊ pc) + bīŋŊ (1 - pt) īŋŊ (1 - pc) + cīŋŊ pt īŋŊ (1 - pc) + d īŋŊ (1 - pt) īŋŊ pc
Figure 12 Branch prediction
29. The average number of cycles added due to a branch instruction is Cave īŋŊ pb
= pb īŋŊ (a īŋŊ pt īŋŊ pc + b īŋŊ (1 - pt) īŋŊ (1 - pc) + c īŋŊ pt īŋŊ (1 - pc) + d īŋŊ (1 - pt) īŋŊ pc).
We can make two assumptions to help us to simplify this general expression. The first
is that a = d = N (i.e., if the prediction is correct the number of cycles is N). The other
simplification is that b = c = B (i.e., if the prediction is wrong the number of cycles
is B). The average number of cycles per branch instruction is therefore:
pb īŋŊ (N īŋŊ pt īŋŊ pc + B īŋŊ pt īŋŊ (1 - pc) + B īŋŊ (1 - pt) īŋŊ (1 - pc) + N īŋŊ (1 - pt) īŋŊ pc)
= pb īŋŊ (N īŋŊ pc + B īŋŊ (1 - pc)).
This formula can be used to investigate tradeoffs between branch penalties, branch
probabilities and pipeline length. There are several ways of implementing branch
prediction (i.e., increasing the value of pc). Two basic approaches are static branch
30. prediction and dynamic branch prediction. Static branch prediction makes the
assumption that branches are always taken or never taken. Since observations of real
code have demonstrated that branches have a greater than 50% chance of being taken,
the best static branch prediction mechanism would be to fetch the next instruction
from the branch target address as soon as the branch instruction is detected.
A better method of predicting the outcome of a branch is by observing its op-code,
because some branch instructions are taken more or less frequently that other branch
instructions. Using the branch op-code to predict that the branch will or will not be
taken results in 75% accuracy. An extension of this technique is to devote a bit of the
op-code to the static prediction of branches. This bit is set or cleared by the compiler
depending on whether the compiler estimates that the branch is most likely to be
taken. This technique provides branch prediction accuracy in the range 74 to 94%.
Dynamic branch prediction techniques operate at runtime and use the past behavior of
the program to predict its future behavior. Suppose the processor maintains a table of
branch instructions. This branch table contains information about the likely behavior
of each branch. Each time a branch is executed, its outcome (i.e., taken or not taken is
used to update the entry in the table. The processor uses the table to determine
whether to take the next instruction from the branch target address (i.e., branch
predicted taken) or from the next address in sequence (branch predicted not taken).
Single-bit branch predictors provide an accuracy of over 80 percent and five-bit
predictors provide an accuracy of up to 98 percent. A typical branch prediction
algorithm uses the last two outcomes of a branch to predict its future. If the last two
outcomes are X, the next branch is assumed to lead to outcome X. If the prediction is
wrong it remains the same the next time the branch is executed (i.e., two failures are
needed to modify the prediction). After two consecutive failures, the prediction is
inverted and the other outcome assumed. This algorithm responds to trends and is not
affected by the occasional single different outcome.
Problems
1. What are the characteristics of a CISC processor?
2. The most frequently executed class of instruction is the data move instruction. Why
is this?
3. The Berkeley RISC has a 32-bit architecture and yet provides only a 13-bit literal.
Why is this and does it really matter?
31. 4. What are the advantages and disadvantages of register windowing?
5. What is pipelining and how does it increase the performance of a computer?
6. A pipeline is defined by its length (i.e., the number of stages that can operate in
parallel). A pipeline can be short or long. What do you think are the relative
advantages of longs and short pipelines?
7. What is data dependency in a pipelined system and how can its effects be
overcome?
8. RISC architectures don't permit operations on operands in memory other than load
and store operations. Why?
9. The average number of cycles required by a RISC to execute an instruction is given
by Tave = 1 + pbīŋŊptīŋŊb.
where
The probability that a given instruction is a branch is pb
The probability that a branch instruction will be taken is pt
If a branch is taken, the additional penalty is b cycles
If a branch is not taken, there is no penalty
Draw a series of graphs of the average number of cycles per instruction as a function
of pbīŋŊpt for b = 1, 2, 3, and 4.
10. What is branch prediction and how can it be used to reduce the so-called branch
penalty in a pipelined system?