The document describes the design and analysis of a 32-bit pipelined MIPS RISC processor. A 6-stage pipeline is implemented, consisting of instruction fetch, instruction decode, register read, memory access, execute, and write back stages. Various low power and high speed techniques are used, including power gating and deeper pipelining. The processor is implemented on a Spartan 3E FPGA and analyzed using Xilinx tools. Simulation results show the pipeline consumes low power of 0.129W and achieves a high frequency of 285.583MHz.
The document discusses computer architecture and describes the seven dimensions of an Instruction Set Architecture (ISA). It also defines dependability and its two measures - reliability and availability. Some example performance measurements are provided along with the processor performance equation. Finally, it discusses measuring, reporting, and summarizing computer performance using benchmarks and benchmark suites.
Review paper on 32-BIT RISC processor with floating point arithmeticIRJET Journal
This document reviews a proposed 32-bit RISC processor with floating point arithmetic. It discusses RISC and floating point concepts, reviews previous related work on RISC processor design, and proposes the design of a 32-bit RISC processor with the following key aspects:
- An instruction set with over 30 instructions in R-type, I-type, J-type, and I/O formats.
- A five-stage pipeline consisting of instruction fetch, decode, execution, memory/IO, and write-back stages.
- The inclusion of a floating point unit to support floating point arithmetic and avoid errors encountered in fixed point designs.
- Implementation in VHDL and
IRJET- Design of Low Power 32- Bit RISC Processor using Verilog HDLIRJET Journal
This document describes the design and implementation of a 32-bit reduced instruction set computer (RISC) processor using Verilog HDL. Key aspects include:
1. The processor architecture consists of a control unit, datapath unit, and memory unit. The control unit uses a finite state machine to control the datapath.
2. The datapath contains subunits like register file, ALU, and memory interface that perform arithmetic and logic operations.
3. The processor follows a Harvard architecture with separate program and data memory. It uses a single instruction single data execution model.
4. Operation involves 5 stages - instruction fetch, decode, execute, memory access, and write back. The control unit generates signals to coordinate
The document discusses memory hierarchy in computers. It explains that memory is classified based on its distance from the processor, with the closest memory being the fastest. The different levels of memory hierarchy from fastest to slowest are CPU registers, L1 cache, L2 cache, main memory, virtual memory, and disk. Each level provides faster access but lower capacity than the levels below it.
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...IJERD Editor
This document describes the design of a 32-bit RISC CPU for convolution operations. The CPU uses a uniform 32-bit instruction format and operates in a single cycle without pipelining. It has a load/store architecture with 8 general purpose 32-bit registers and performs arithmetic and logical operations on the registers but not memory. The CPU includes a program counter, ALU, register file, instruction decoder, and clock control unit. It is designed for low power and high speed processing of convolution which is widely used in signal and image processing applications.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
This document describes formal verification of a pipelined CISC microprocessor modeled after the Intel IA32 instruction set using the UCLID term-level verifier. The objective was to understand UCLID's strengths and weaknesses for modeling hardware designs and the verification process. A pipelined Y86 processor implementation from a textbook was verified against its sequential reference model. The control logic was automatically translated to UCLID format. Modularity and automation were emphasized to maintain model fidelity during verification.
The document discusses computer architecture and describes the seven dimensions of an Instruction Set Architecture (ISA). It also defines dependability and its two measures - reliability and availability. Some example performance measurements are provided along with the processor performance equation. Finally, it discusses measuring, reporting, and summarizing computer performance using benchmarks and benchmark suites.
Review paper on 32-BIT RISC processor with floating point arithmeticIRJET Journal
This document reviews a proposed 32-bit RISC processor with floating point arithmetic. It discusses RISC and floating point concepts, reviews previous related work on RISC processor design, and proposes the design of a 32-bit RISC processor with the following key aspects:
- An instruction set with over 30 instructions in R-type, I-type, J-type, and I/O formats.
- A five-stage pipeline consisting of instruction fetch, decode, execution, memory/IO, and write-back stages.
- The inclusion of a floating point unit to support floating point arithmetic and avoid errors encountered in fixed point designs.
- Implementation in VHDL and
IRJET- Design of Low Power 32- Bit RISC Processor using Verilog HDLIRJET Journal
This document describes the design and implementation of a 32-bit reduced instruction set computer (RISC) processor using Verilog HDL. Key aspects include:
1. The processor architecture consists of a control unit, datapath unit, and memory unit. The control unit uses a finite state machine to control the datapath.
2. The datapath contains subunits like register file, ALU, and memory interface that perform arithmetic and logic operations.
3. The processor follows a Harvard architecture with separate program and data memory. It uses a single instruction single data execution model.
4. Operation involves 5 stages - instruction fetch, decode, execute, memory access, and write back. The control unit generates signals to coordinate
The document discusses memory hierarchy in computers. It explains that memory is classified based on its distance from the processor, with the closest memory being the fastest. The different levels of memory hierarchy from fastest to slowest are CPU registers, L1 cache, L2 cache, main memory, virtual memory, and disk. Each level provides faster access but lower capacity than the levels below it.
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...IJERD Editor
This document describes the design of a 32-bit RISC CPU for convolution operations. The CPU uses a uniform 32-bit instruction format and operates in a single cycle without pipelining. It has a load/store architecture with 8 general purpose 32-bit registers and performs arithmetic and logical operations on the registers but not memory. The CPU includes a program counter, ALU, register file, instruction decoder, and clock control unit. It is designed for low power and high speed processing of convolution which is widely used in signal and image processing applications.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
This document describes formal verification of a pipelined CISC microprocessor modeled after the Intel IA32 instruction set using the UCLID term-level verifier. The objective was to understand UCLID's strengths and weaknesses for modeling hardware designs and the verification process. A pipelined Y86 processor implementation from a textbook was verified against its sequential reference model. The control logic was automatically translated to UCLID format. Modularity and automation were emphasized to maintain model fidelity during verification.
This document discusses instruction-level parallelism (ILP) limitations. It covers ILP background using a MIPS example, hardware models that were studied including register renaming and branch/jump prediction assumptions. A study of ILP limitations found diminishing returns with larger window sizes and realizable processors are limited by complexity and power constraints. Simultaneous multithreading was explored as a technique to improve ILP but has its own design challenges. Today, x86 and ARM processors employ various ILP optimizations within pipeline constraints.
Fpga based 128 bit customised vliw processor for executing dual scalarvector ...eSAT Publishing House
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
Run time dynamic partial reconfiguration using microblaze soft core processor...eSAT Journals
aydeshmukh@gmail.com
Abstract
DSP Application requires a fast computations & flexibility of the design. Partial Reconfiguration (PR) is an advanced technique,
which improves the flexibility of FPGAs by allowing portions of a design to be reconfigured at runtime by overwriting parts of the
configuration memory. In this paper we are using microblaze soft core processor & ICAP Port to reconfigure the FPGA at runtime.
ICAP is accessed through a light-weight custom IP which requires bit stream length, go, and done signal to interface to a system that
provides partial bit stream data. The partial bit stream is provided by the processor system by reading the partial bit files from the
compact flash card. Our targeted DSP application is matrix multiplication; we are reconfiguring design by changing partial modules
at run time. To change the partial bit stream we interfaces a microblaze Soft processor & using a UART interface.ISE13.1 &
PlanAhead is used for partial reconfiguration of FPGA .EDK is used for microblaze soft processor design & ICAP Interface .The
simulation is done using Chip Scope Logic Analyzer & the complete hardware implementation is done on Xilinx VIRTEX -6 ML605
Platform.
Keywords — PlanAhead, EDK, Dynamic partial reconfiguration, ICAP, Matrix multiplication, Chipscope pro analysis,
DSP application, Microblaze processor
Run-Time Adaptive Processor Allocation of Self-Configurable Intel IXP2400 Net...CSCJournals
An ideal Network Processor, that is, a programmable multi-processor device must be capable of offering both the flexibility and speed required for packet processing. But current Network Processor systems generally fall short of the above benchmarks due to traffic fluctuations inherent in packet networks, and the resulting workload variation on individual pipeline stage over a period of time ultimately affects the overall performance of even an otherwise sound system. One potential solution would be to change the code running at these stages so as to adapt to the fluctuations; a near robust system with standing traffic fluctuations is the dynamic adaptive processor, reconfiguring the entire system, which we introduce and study to some extent in this paper. We achieve this by using a crucial decision making model, transferring the binary code to the processor through the SOAP protocol.
1. Parallel computation is needed to achieve high performance as modern processors have limitations despite features like caches, buses, and pipelines. Parallel computers use multiple CPUs working together to solve problems faster.
2. Flynn's classification categorizes computer architectures based on their instruction and data flows as single instruction stream single data stream (SISD), single instruction stream multiple data stream (SIMD), or multiple instruction stream multiple data stream (MIMD).
3. Important metrics for measuring parallel performance include speedup, which measures improvement over sequential execution, and efficiency, which relates speedup to number of processors used. According to Amdahl's law, even small amounts of sequential code limit maximum speedup attainable.
This document discusses system design techniques and networks for embedded systems. It covers topics like design methodologies, requirement analysis, specifications, system analysis and architecture design. It discusses different design flows like waterfall model, spiral model and successive refinement model. It also discusses quality assurance techniques important for delivering quality embedded systems. Specific techniques covered include concurrent engineering, requirements analysis, different specification languages like SDL and state charts, system analysis using CRC cards and ensuring quality throughout the design process.
The document discusses RISC (reduced instruction set computers) architectures compared to CISC (complex instruction set computers) architectures. Some key points:
- RISCs aim to simplify the instruction set to allow for faster execution, while CISCs include more complex instructions closer to high-level languages.
- Studies show programs spend most time on simple operations like moves and branches, using simple addressing modes and local variables, informing the RISC approach.
- RISCs use load/store architectures, fixed-length instructions, delayed loading, and many registers to improve performance over CISCs.
- While RISCs have advantages in speed and simplicity, comparisons are complex and modern processors combine RIS
VTU 4TH SEM CSE COMPUTER ORGANIZATION SOLVED PAPERS OF JUNE-2013 JUNE-2014 & ...vtunotesbysree
This document contains a solved question paper for Computer Organization from June 2013. It includes questions and detailed answers on topics such as basic computer operations, number representation systems, addressing modes, input/output operations, interrupts, bus arbitration, and the Universal Serial Bus protocol. The solved questions cover concepts, provide examples, and include diagrams to illustrate computer hardware and architecture.
A Customized Reconfiguration Controller with Remote Direct ICAP Access for Dy...TELKOMNIKA JOURNAL
As FPGA dynamic partial reconfiguration getting into mainstream, design of reconfiguration controller
becomes an active research. Most of the existing reconfiguration controllers support only the loading
of partial bitstream into configuration memory without allowing user to access ICAP directly, which can provide
user higher controllability over the reconfigurable device. This paper presents the architecture of a customized
reconfiguration controller with remote direct ICAP access. Remote direct ICAP access allows user to
configure or readback device internal registers, which offer user higher controllability over the reconfigurable
device. Additionally, the proposed reconfiguration controller achieved at least 3.19 Gbps of reconfiguration
throughput, which reduces the platform service downtime during dynamic partial reconfiguration. In order to
reduce the latency and transmission overhead of remote functional update, partial bitstream is compressed
with run-length encoding before transmission.
This document discusses the implementation of an H.264 video decoder on an ARM11 multiprocessor core (MPCore). It first provides background on the ARM11 MPCore architecture and its advantages for multimedia applications. It then discusses the computational complexity of H.264 decoding and introduces an optimized algorithm for context-adaptive variable length coding (CAVLC) decoding and deblocking filtering. The implementation of a multimedia framework on the ARM11 MPCore is described, including porting Linux and GStreamer libraries. Finally, directories and files related to the CMM driver and testing procedure are outlined.
High speed customized serial protocol for IP integration on FPGA based SOC ap...IJMER
International Journal of Modern Engineering Research (IJMER) is Peer reviewed, online Journal. It serves as an international archival forum of scholarly research related to engineering and science education.
International Journal of Modern Engineering Research (IJMER) covers all the fields of engineering and science: Electrical Engineering, Mechanical Engineering, Civil Engineering, Chemical Engineering, Computer Engineering, Agricultural Engineering, Aerospace Engineering, Thermodynamics, Structural Engineering, Control Engineering, Robotics, Mechatronics, Fluid Mechanics, Nanotechnology, Simulators, Web-based Learning, Remote Laboratories, Engineering Design Methods, Education Research, Students' Satisfaction and Motivation, Global Projects, and Assessment…. And many more.
This document discusses instruction level parallelism (ILP) and how it can be used to improve performance by overlapping the execution of instructions through pipelining. ILP refers to the potential overlap among instructions within a basic block. Factors like dynamic branch prediction and compiler dependence analysis can impact the ideal pipeline CPI and number of data hazard stalls. Loop level parallelism refers to the parallelism available across iterations of a loop. Data dependencies between instructions, if not properly handled, can limit parallelism and require instructions to execute in order. The three types of data dependencies are data, name, and control dependencies.
Efficient register renaming and recovery for high-performance processors.Jinto George
Register renaming is a technique which is used to improve the performance and speed for a high-performance processors. It can be done using RAM , CAM and hybrid combination of RAM & CAM.
The document discusses parallelism and techniques to improve computer performance through parallel execution. It describes instruction level parallelism (ILP) where multiple instructions can be executed simultaneously through techniques like pipelining and superscalar processing. It also discusses processor level parallelism using multiple processors or processor cores to concurrently execute different tasks or threads.
The document discusses parallel processing and provides classifications of parallel computer architectures. It describes Flynn's classification of computer architectures as single instruction stream single data stream (SISD), single instruction stream multiple data stream (SIMD), multiple instruction stream single data stream (MISD), and multiple instruction stream multiple data stream (MIMD). It also discusses pipeline computers, array processors, and multiprocessor systems as different architectural configurations for parallel computers. Pipelining is described as a technique to decompose a process into sub-operations that execute concurrently in dedicated segments to achieve overlapping computation.
Vector Supercomputers and Scientific Array ProcessorsHsuvas Borkakoty
This presentation discusses vector supercomputers and scientific attached processors. It covers the generations and processing speeds of vector supercomputers, as well as their application areas. Scientific attached processors are designed to enhance the floating point capabilities of host computers and are used to accelerate applications like structural analysis and computational chemistry. The document projects future speeds for scientific attached processors and discusses their advantages of enhancing host machine speeds while having lower costs than mainframes, as well as limitations like requiring microcoding and expensive software.
Implementation of resource sharing strategy for power optimization in embedde...Alexander Decker
This document discusses the implementation of a resource sharing strategy to optimize power in embedded processors. The strategy is implemented at the hardware level in the decode stage of a 32-bit RISC processor with a 4-stage pipeline. By redefining some instructions to share common resources like adders and decoders, unnecessary switching activity is reduced, lowering dynamic power consumption. Power analysis shows the modified design consumes 3mW less power, a 2.65% improvement, across different clock frequencies compared to the original design. The proposed strategy successfully optimizes power through hardware-level resource sharing.
An octa core processor with shared memory and message-passingeSAT Journals
Abstract This being the era of fast, high performance computing, there is the need of having efficient optimizations in the processor architecture and at the same time in memory hierarchy too. Each and every day, the advancement of applications in communication and multimedia systems are compelling to increase number of cores in the main processor viz., dual-core, quad-core, octa-core and so on. But, for enhancing the overall performance of multi processor chip, there are stringent requirements to improve inter-core synchronization. Thus, a MPSoC with 8-cores supporting both message-passing and shared-memory inter-core communication mechanisms is implemented on Virtex 5 LX110T FPGA. Each core is based on MIPS III (Microprocessor without interlocked pipelined stages) ISA, handling only integer type instructions and having six-stage pipeline with data hazard detection unit and forwarding logic. The eight processing cores and one central shared memory core are inter connected using 3x3 2-D mesh topology based Network-on-chip (NoC) with virtual channel router. The router is four stage pipelined supporting DOR X-Y routing algorithm and with round robin arbitration technique. For verification and functionality test of above fully synthesized multi core processor, matrix multiplication operation is mapped onto the above said. Partitioning and scheduling of multiple multiplications and addition for each element of resultant matrix has been done accordingly among eight cores to get maximum throughput. All the codes for processor design are written in Verilog HDL. Keywords: MPSoC, message-passing, shared memory, MIPS, ISA, wormhole router, network-on-chip, SIMD, data level parallelism, 2-D Mesh, virtual channel
This document discusses instruction-level parallelism (ILP) limitations. It covers ILP background using a MIPS example, hardware models that were studied including register renaming and branch/jump prediction assumptions. A study of ILP limitations found diminishing returns with larger window sizes and realizable processors are limited by complexity and power constraints. Simultaneous multithreading was explored as a technique to improve ILP but has its own design challenges. Today, x86 and ARM processors employ various ILP optimizations within pipeline constraints.
Fpga based 128 bit customised vliw processor for executing dual scalarvector ...eSAT Publishing House
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
Run time dynamic partial reconfiguration using microblaze soft core processor...eSAT Journals
aydeshmukh@gmail.com
Abstract
DSP Application requires a fast computations & flexibility of the design. Partial Reconfiguration (PR) is an advanced technique,
which improves the flexibility of FPGAs by allowing portions of a design to be reconfigured at runtime by overwriting parts of the
configuration memory. In this paper we are using microblaze soft core processor & ICAP Port to reconfigure the FPGA at runtime.
ICAP is accessed through a light-weight custom IP which requires bit stream length, go, and done signal to interface to a system that
provides partial bit stream data. The partial bit stream is provided by the processor system by reading the partial bit files from the
compact flash card. Our targeted DSP application is matrix multiplication; we are reconfiguring design by changing partial modules
at run time. To change the partial bit stream we interfaces a microblaze Soft processor & using a UART interface.ISE13.1 &
PlanAhead is used for partial reconfiguration of FPGA .EDK is used for microblaze soft processor design & ICAP Interface .The
simulation is done using Chip Scope Logic Analyzer & the complete hardware implementation is done on Xilinx VIRTEX -6 ML605
Platform.
Keywords — PlanAhead, EDK, Dynamic partial reconfiguration, ICAP, Matrix multiplication, Chipscope pro analysis,
DSP application, Microblaze processor
Run-Time Adaptive Processor Allocation of Self-Configurable Intel IXP2400 Net...CSCJournals
An ideal Network Processor, that is, a programmable multi-processor device must be capable of offering both the flexibility and speed required for packet processing. But current Network Processor systems generally fall short of the above benchmarks due to traffic fluctuations inherent in packet networks, and the resulting workload variation on individual pipeline stage over a period of time ultimately affects the overall performance of even an otherwise sound system. One potential solution would be to change the code running at these stages so as to adapt to the fluctuations; a near robust system with standing traffic fluctuations is the dynamic adaptive processor, reconfiguring the entire system, which we introduce and study to some extent in this paper. We achieve this by using a crucial decision making model, transferring the binary code to the processor through the SOAP protocol.
1. Parallel computation is needed to achieve high performance as modern processors have limitations despite features like caches, buses, and pipelines. Parallel computers use multiple CPUs working together to solve problems faster.
2. Flynn's classification categorizes computer architectures based on their instruction and data flows as single instruction stream single data stream (SISD), single instruction stream multiple data stream (SIMD), or multiple instruction stream multiple data stream (MIMD).
3. Important metrics for measuring parallel performance include speedup, which measures improvement over sequential execution, and efficiency, which relates speedup to number of processors used. According to Amdahl's law, even small amounts of sequential code limit maximum speedup attainable.
This document discusses system design techniques and networks for embedded systems. It covers topics like design methodologies, requirement analysis, specifications, system analysis and architecture design. It discusses different design flows like waterfall model, spiral model and successive refinement model. It also discusses quality assurance techniques important for delivering quality embedded systems. Specific techniques covered include concurrent engineering, requirements analysis, different specification languages like SDL and state charts, system analysis using CRC cards and ensuring quality throughout the design process.
The document discusses RISC (reduced instruction set computers) architectures compared to CISC (complex instruction set computers) architectures. Some key points:
- RISCs aim to simplify the instruction set to allow for faster execution, while CISCs include more complex instructions closer to high-level languages.
- Studies show programs spend most time on simple operations like moves and branches, using simple addressing modes and local variables, informing the RISC approach.
- RISCs use load/store architectures, fixed-length instructions, delayed loading, and many registers to improve performance over CISCs.
- While RISCs have advantages in speed and simplicity, comparisons are complex and modern processors combine RIS
VTU 4TH SEM CSE COMPUTER ORGANIZATION SOLVED PAPERS OF JUNE-2013 JUNE-2014 & ...vtunotesbysree
This document contains a solved question paper for Computer Organization from June 2013. It includes questions and detailed answers on topics such as basic computer operations, number representation systems, addressing modes, input/output operations, interrupts, bus arbitration, and the Universal Serial Bus protocol. The solved questions cover concepts, provide examples, and include diagrams to illustrate computer hardware and architecture.
A Customized Reconfiguration Controller with Remote Direct ICAP Access for Dy...TELKOMNIKA JOURNAL
As FPGA dynamic partial reconfiguration getting into mainstream, design of reconfiguration controller
becomes an active research. Most of the existing reconfiguration controllers support only the loading
of partial bitstream into configuration memory without allowing user to access ICAP directly, which can provide
user higher controllability over the reconfigurable device. This paper presents the architecture of a customized
reconfiguration controller with remote direct ICAP access. Remote direct ICAP access allows user to
configure or readback device internal registers, which offer user higher controllability over the reconfigurable
device. Additionally, the proposed reconfiguration controller achieved at least 3.19 Gbps of reconfiguration
throughput, which reduces the platform service downtime during dynamic partial reconfiguration. In order to
reduce the latency and transmission overhead of remote functional update, partial bitstream is compressed
with run-length encoding before transmission.
This document discusses the implementation of an H.264 video decoder on an ARM11 multiprocessor core (MPCore). It first provides background on the ARM11 MPCore architecture and its advantages for multimedia applications. It then discusses the computational complexity of H.264 decoding and introduces an optimized algorithm for context-adaptive variable length coding (CAVLC) decoding and deblocking filtering. The implementation of a multimedia framework on the ARM11 MPCore is described, including porting Linux and GStreamer libraries. Finally, directories and files related to the CMM driver and testing procedure are outlined.
High speed customized serial protocol for IP integration on FPGA based SOC ap...IJMER
International Journal of Modern Engineering Research (IJMER) is Peer reviewed, online Journal. It serves as an international archival forum of scholarly research related to engineering and science education.
International Journal of Modern Engineering Research (IJMER) covers all the fields of engineering and science: Electrical Engineering, Mechanical Engineering, Civil Engineering, Chemical Engineering, Computer Engineering, Agricultural Engineering, Aerospace Engineering, Thermodynamics, Structural Engineering, Control Engineering, Robotics, Mechatronics, Fluid Mechanics, Nanotechnology, Simulators, Web-based Learning, Remote Laboratories, Engineering Design Methods, Education Research, Students' Satisfaction and Motivation, Global Projects, and Assessment…. And many more.
This document discusses instruction level parallelism (ILP) and how it can be used to improve performance by overlapping the execution of instructions through pipelining. ILP refers to the potential overlap among instructions within a basic block. Factors like dynamic branch prediction and compiler dependence analysis can impact the ideal pipeline CPI and number of data hazard stalls. Loop level parallelism refers to the parallelism available across iterations of a loop. Data dependencies between instructions, if not properly handled, can limit parallelism and require instructions to execute in order. The three types of data dependencies are data, name, and control dependencies.
Efficient register renaming and recovery for high-performance processors.Jinto George
Register renaming is a technique which is used to improve the performance and speed for a high-performance processors. It can be done using RAM , CAM and hybrid combination of RAM & CAM.
The document discusses parallelism and techniques to improve computer performance through parallel execution. It describes instruction level parallelism (ILP) where multiple instructions can be executed simultaneously through techniques like pipelining and superscalar processing. It also discusses processor level parallelism using multiple processors or processor cores to concurrently execute different tasks or threads.
The document discusses parallel processing and provides classifications of parallel computer architectures. It describes Flynn's classification of computer architectures as single instruction stream single data stream (SISD), single instruction stream multiple data stream (SIMD), multiple instruction stream single data stream (MISD), and multiple instruction stream multiple data stream (MIMD). It also discusses pipeline computers, array processors, and multiprocessor systems as different architectural configurations for parallel computers. Pipelining is described as a technique to decompose a process into sub-operations that execute concurrently in dedicated segments to achieve overlapping computation.
Vector Supercomputers and Scientific Array ProcessorsHsuvas Borkakoty
This presentation discusses vector supercomputers and scientific attached processors. It covers the generations and processing speeds of vector supercomputers, as well as their application areas. Scientific attached processors are designed to enhance the floating point capabilities of host computers and are used to accelerate applications like structural analysis and computational chemistry. The document projects future speeds for scientific attached processors and discusses their advantages of enhancing host machine speeds while having lower costs than mainframes, as well as limitations like requiring microcoding and expensive software.
Implementation of resource sharing strategy for power optimization in embedde...Alexander Decker
This document discusses the implementation of a resource sharing strategy to optimize power in embedded processors. The strategy is implemented at the hardware level in the decode stage of a 32-bit RISC processor with a 4-stage pipeline. By redefining some instructions to share common resources like adders and decoders, unnecessary switching activity is reduced, lowering dynamic power consumption. Power analysis shows the modified design consumes 3mW less power, a 2.65% improvement, across different clock frequencies compared to the original design. The proposed strategy successfully optimizes power through hardware-level resource sharing.
An octa core processor with shared memory and message-passingeSAT Journals
Abstract This being the era of fast, high performance computing, there is the need of having efficient optimizations in the processor architecture and at the same time in memory hierarchy too. Each and every day, the advancement of applications in communication and multimedia systems are compelling to increase number of cores in the main processor viz., dual-core, quad-core, octa-core and so on. But, for enhancing the overall performance of multi processor chip, there are stringent requirements to improve inter-core synchronization. Thus, a MPSoC with 8-cores supporting both message-passing and shared-memory inter-core communication mechanisms is implemented on Virtex 5 LX110T FPGA. Each core is based on MIPS III (Microprocessor without interlocked pipelined stages) ISA, handling only integer type instructions and having six-stage pipeline with data hazard detection unit and forwarding logic. The eight processing cores and one central shared memory core are inter connected using 3x3 2-D mesh topology based Network-on-chip (NoC) with virtual channel router. The router is four stage pipelined supporting DOR X-Y routing algorithm and with round robin arbitration technique. For verification and functionality test of above fully synthesized multi core processor, matrix multiplication operation is mapped onto the above said. Partitioning and scheduling of multiple multiplications and addition for each element of resultant matrix has been done accordingly among eight cores to get maximum throughput. All the codes for processor design are written in Verilog HDL. Keywords: MPSoC, message-passing, shared memory, MIPS, ISA, wormhole router, network-on-chip, SIMD, data level parallelism, 2-D Mesh, virtual channel
Design & Simulation of RISC Processor using Hyper Pipelining TechniqueIOSR Journals
This Hyper pipelining technique is different to the pipelining of instruction decoding known from
RISC processors. The point is that we can use hyper pipelining on top of any sequential logic, for example a
RISC processor, independent of its underlying functionality. The RISC processor with pipelined instruction set
decoding can automatically be hyper pipelined to generate CMF individual RISC processors. Hyper pipelining
implements additional register and can use register balancing for fine grain timing optimizations. The method
hyper pipelining is also called “C-slow Retiming”. The main benefit is the multiplication of the core's
functionality by only implementing registers. This is a great advantage for ASICs but obviously very attractive
for FPGAs with their already existing registers
LOGIC OPTIMIZATION USING TECHNOLOGY INDEPENDENT MUX BASED ADDERS IN FPGAVLSICS Design
Adders form an almost obligatory component of every contemporary integrated circuit. The prerequisite of the adder is that it is primarily fast and secondarily efficient in terms of power consumption and chip area. Therefore, careful optimization of the adder is of the greatest importance. This optimization can be attained
in two levels; it can be circuit or logic optimization. In circuit optimization the size of transistors are manipulated, where as in logic optimization the Boolean equations are rearranged (or manipulated) to optimize speed, area and power consumption. This paper focuses the optimization of adder through technology independent mapping. The work presents 20 different logical construction of 1-bit adder cell in CMOS logic and its performance is analyzed in terms of transistor count, delay and power dissipation. These performance issues are analyzed through Tanner EDA with TSMC MOSIS 250nm technology. From this analysis the optimized equation is chosen to construct a full adder circuit in terms of multiplexer. This logic optimized multiplexer based adders are incorporated in selected existing adders like ripple carry
adder, carry look-ahead adder, carry skip adder, carry select adder, carry increment adder and carry save adder and its performance is analyzed in terms of area (slices used) and maximum combinational path delay as a function of size. The target FPGA device chosen for the implementation of these adders was Xilinx ISE 12.1 Spartan3E XC3S500-5FG320. Each adder type was implemented with bit sizes of: 8, 16, 32, 64 bits. This variety of sizes will provide with more insight about the performance of each adder in terms of area and delay as a function of size.
Hardback solution to accelerate multimedia computation through mgp in cmpeSAT Publishing House
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
INCREASING THE THROUGHPUT USING EIGHT STAGE PIPELININGijiert bestjournal
Using re-programmable logic components along with HDL languages encompasses wider and wider areas of practical applications,becoming a standard of complex digital sys tem design. One of the basic tasks,which are to be carried out in the process of design,is obtaining the highest efficiency of the solution under design. Thereby designers are still looking for methods making it possible to speed up design processing time. Pipelining mechanism is one of these methods. It helps to speed up som e dedicated operations. In the early stage of design,a given unit described by high level language,is divided into some independent parts,which are synchronized with each other via intermediate registers and synchr onization signal (pipelining mechanism). 8- stage pipelining is the key implementation technique used to make f ast CPUs. It is an optimization technique used to speed up instruction execution. Throughput of an instruction pipelin e is increased while latency is decreased for each instruction execution. This new 8-stage pipel ining includes two instruction fetch,one instruction decode,two execution,two memory and one write back stag es. It describes advantages of both speed and suitability for synthesizable RISC design.
This document proposes extending algorithmic skeletons with event-driven programming to address the inversion of control problem in skeleton frameworks. It introduces event listeners that can be registered at event hooks within skeletons to access runtime information. This allows implementing non-functional concerns like logging and performance monitoring separately from the core parallel logic. The approach is implemented in the Skandium skeleton library, and examples are given of a logger and online performance monitor built using it. An analysis shows the overhead of processing events is negligible, at around 20 microseconds per event.
Parallel processing involves performing multiple tasks simultaneously to increase computational speed. It can be achieved through pipelining, where instructions are overlapped in execution, or vector/array processors where the same operation is performed on multiple data elements at once. The main types are SIMD (single instruction multiple data) and MIMD (multiple instruction multiple data). Pipelining provides higher throughput by keeping the pipeline full but requires handling dependencies between instructions to avoid hazards slowing things down.
A REVIEW ON ANALYSIS OF 32-BIT AND 64-BIT RISC PROCESSORSIRJET Journal
This document provides a review and comparison of 32-bit and 64-bit RISC processors. It discusses the system architectures of 32-bit and 64-bit RISC processors, including their instruction sets, registers, arithmetic logic units, control units, and flag registers. It also summarizes previous research comparing the performance of 16-bit and 32-bit RISC processors in terms of power consumption, operating frequency, and delay. The document aims to analyze and compare implementation models and operational elements such as acceleration and power dissipation between 32-bit and 64-bit RISC processors.
A NETWORK-BASED DAC OPTIMIZATION PROTOTYPE SOFTWARE 2 (1).pdfSaiReddy794166
The International Journal of Engineering and Science and Research is online journal in English published. The aim is to publish peer review and research articles without delay in the developing in engineering and science Research.The International Journal of Engineering and Science and Research is online journal in English published. The aim is to publish peer review and research articles without delay in the developing in engineering and science Research.
STUDY OF VARIOUS FACTORS AFFECTING PERFORMANCE OF MULTI-CORE PROCESSORSijdpsjournal
Advances in Integrated Circuit processing allow for more microprocessor design options. As Chip Multiprocessor system (CMP) become the predominant topology for leading microprocessors, critical components of the system are now integrated on a single chip. This enables sharing of computation resources that was not previously possible. In addition the virtualization of these computation resources exposes the system to a mix of diverse and competing workloads. On chip Cache memory is a resource of primary concern as it can be dominant in controlling overall throughput. This Paper presents analysis of various parameters affecting the performance of Multi-core Architectures like varying the number of cores, changes L2 cache size, further we have varied directory size from 64 to 2048 entries on a 4 node, 8 node 16 node and 64 node Chip multiprocessor which in turn presents an open area of research on multicore processors with private/shared last level cache as the future trend seems to be towards tiled architecture executing multiple parallel applications with optimized silicon area utilization and excellent performance.
A Unique Test Bench for Various System-on-a-Chip IJECEIAES
This paper discusses a standard flow on how an automated test bench environment which is randomized with constraints can verify a SOC efficiently for its functionality and coverage. Today, in the time of multimillion gate ASICs, reusable intellectual property (IP), and system-ona-chip (SoC) designs, verification consumes about 70 % of the design effort. Automation means a machine completes a task autonomously, quicker and with predictable results. Automation requires standard processes with welldefined inputs and outputs. By using this efficient methodology it is possible to provide a general purpose automation solution for verification, given today’s technology. Tools automating various portions of the verification process are being introduced. Here, we have Communication based SOC The content of the paper discusses about the methodology used to verify such a SOC-based environment. Cadence Efficient Verification Methodology libraries are explored for the solution of this problem. We can take this as a state of art approach in verifying SOC environments. The goal of this paper is to emphasize the unique testbench for different SOC using Efficient Verification Constructs implemented in system verilog for SOC verification.
Implementing True Zero Cycle Branching in Scalar and Superscalar Pipelined Pr...IDES Editor
In this paper, we have proposed a novel architectural
technique which can be used to boost performance of modern
day processors. It is especially useful in certain code constructs
like small loops and try-catch blocks. The technique is aimed
at improving performance by reducing the number of
instructions that need to enter the pipeline itself. We also
demonstrate its working in a scalar pipelined soft-core
processor developed by us. Lastly, we present how a superscalar
microprocessor can take advantage of this technique and
increase its performance.
The document analyzes the performance of the LEON 3FT processor at different operating frequencies. A hardware implementation using the LEON 3FT processor was tested by executing benchmark programs at various frequencies. The results show that execution time decreases with higher operating frequencies, though there is a maximum frequency limit due to hardware constraints. Future work involves attempting to increase this maximum frequency limit while maintaining processor performance.
Automatically partitioning packet processing applications for pipelined archi...Ashley Carter
This document describes a technique for automatically partitioning sequential packet processing applications into coordinated parallel subtasks that can be efficiently mapped to pipelined network processor architectures. The technique balances work among pipeline stages and minimizes data transmission between stages. It was implemented in an auto-partitioning C compiler for Intel network processors. Experimental results showed over 4x speedups for IPv4 and IP forwarding benchmarks on a 9-stage pipeline compared to non-partitioned code.
The proposed system is an efficient processing of 16-bit Multiplier Accumulator using Radix-8 and Radix-16 modified Booth Algorithm and other adders (SPST adder, Carry select adder, Parallel Prefix adder) using VHDL (Very High Speed Integrated Circuit Hardware Description Language). This proposed system provides low power, high speed and fewer delays. In both booth multipliers, comparison between the power consumption (mw) and estimated delay (ns) are calculated. The application of digital signal processing like fast fourier transform, finite impulse response and convolution needs high speed and low power MAC (Multiplier and Accumulator) units to construct an added. By reducing the glitches (from 1 to 0 transition) and spikes (from 0 to 1 transition), the speed of operation is improved and dynamic power is reduced. The adder designed with SPST avoids the unwanted glitches and spikes, reduce the switching power dissipation and the dynamic power. The speed can be improved by reducing the number of partial products to half, by grouping of bits in the multiplier term. The proposed Radix-8 and Radix-16 Modified Booth Algorithm MAC with SPST reduces the delay and obtain low power consumption as compared to array MAC.
This document contains the questions and answers from a computer architecture and organization exam. It includes questions about the differences between computer architecture and organization, instruction formats, bus definitions, cache memory advantages, and virtual memory. The responses provide detailed explanations of concepts like locality of reference, thrashing, address mapping, cache hits and misses, and hierarchical memory systems. Justification is given for using a hierarchical approach to improve performance across different memory types. The differences between paging and segmentation in virtual memory are also distinguished.
This document discusses the I2C bus protocol and its implementation on an FPGA to interface with low speed peripheral devices. It also provides background on VLSI design, including the evolution of integration density over time, the VLSI design flow from behavioral to layout representations, and historical context on increasing processing power needs driving advances in integration technologies. The I2C protocol allows communication between multiple chips using only two pins, addressing the need for lower pin counts as chip sizes decrease. The document implements I2C on an FPGA to interface with a DS1307 peripheral and synthesizes it on a Spartan 3E chip.
This document summarizes a research paper that proposes a new approach called DiffP for more energy efficient ad hoc reprogramming of sensor networks. DiffP aims to mitigate the effects of program layout modifications and maximize similarity between old and new software. It also organizes global variables in a novel way to eliminate the effect of variable shifting. The document provides background on challenges with reprogramming deployed sensor networks due to limited energy, processing and memory resources. It reviews related work on dissemination protocols and reprogramming schemes, noting limitations such as producing large patches from layout changes or variable shifts. DiffP is presented as a potential improvement over existing approaches.
Similar to Design and Analysis of A 32-bit Pipelined MIPS Risc Processor (20)
Introduction- e - waste – definition - sources of e-waste– hazardous substances in e-waste - effects of e-waste on environment and human health- need for e-waste management– e-waste handling rules - waste minimization techniques for managing e-waste – recycling of e-waste - disposal treatment methods of e- waste – mechanism of extraction of precious metal from leaching solution-global Scenario of E-waste – E-waste in India- case studies.
Using recycled concrete aggregates (RCA) for pavements is crucial to achieving sustainability. Implementing RCA for new pavement can minimize carbon footprint, conserve natural resources, reduce harmful emissions, and lower life cycle costs. Compared to natural aggregate (NA), RCA pavement has fewer comprehensive studies and sustainability assessments.
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressionsVictor Morales
K8sGPT is a tool that analyzes and diagnoses Kubernetes clusters. This presentation was used to share the requirements and dependencies to deploy K8sGPT in a local environment.
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODELgerogepatton
As digital technology becomes more deeply embedded in power systems, protecting the communication
networks of Smart Grids (SG) has emerged as a critical concern. Distributed Network Protocol 3 (DNP3)
represents a multi-tiered application layer protocol extensively utilized in Supervisory Control and Data
Acquisition (SCADA)-based smart grids to facilitate real-time data gathering and control functionalities.
Robust Intrusion Detection Systems (IDS) are necessary for early threat detection and mitigation because
of the interconnection of these networks, which makes them vulnerable to a variety of cyberattacks. To
solve this issue, this paper develops a hybrid Deep Learning (DL) model specifically designed for intrusion
detection in smart grids. The proposed approach is a combination of the Convolutional Neural Network
(CNN) and the Long-Short-Term Memory algorithms (LSTM). We employed a recent intrusion detection
dataset (DNP3), which focuses on unauthorized commands and Denial of Service (DoS) cyberattacks, to
train and test our model. The results of our experiments show that our CNN-LSTM method is much better
at finding smart grid intrusions than other deep learning algorithms used for classification. In addition,
our proposed approach improves accuracy, precision, recall, and F1 score, achieving a high detection
accuracy rate of 99.50%.
CHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECTjpsjournal1
The rivalry between prominent international actors for dominance over Central Asia's hydrocarbon
reserves and the ancient silk trade route, along with China's diplomatic endeavours in the area, has been
referred to as the "New Great Game." This research centres on the power struggle, considering
geopolitical, geostrategic, and geoeconomic variables. Topics including trade, political hegemony, oil
politics, and conventional and nontraditional security are all explored and explained by the researcher.
Using Mackinder's Heartland, Spykman Rimland, and Hegemonic Stability theories, examines China's role
in Central Asia. This study adheres to the empirical epistemological method and has taken care of
objectivity. This study analyze primary and secondary research documents critically to elaborate role of
china’s geo economic outreach in central Asian countries and its future prospect. China is thriving in trade,
pipeline politics, and winning states, according to this study, thanks to important instruments like the
Shanghai Cooperation Organisation and the Belt and Road Economic Initiative. According to this study,
China is seeing significant success in commerce, pipeline politics, and gaining influence on other
governments. This success may be attributed to the effective utilisation of key tools such as the Shanghai
Cooperation Organisation and the Belt and Road Economic Initiative.
TIME DIVISION MULTIPLEXING TECHNIQUE FOR COMMUNICATION SYSTEMHODECEDSIET
Time Division Multiplexing (TDM) is a method of transmitting multiple signals over a single communication channel by dividing the signal into many segments, each having a very short duration of time. These time slots are then allocated to different data streams, allowing multiple signals to share the same transmission medium efficiently. TDM is widely used in telecommunications and data communication systems.
### How TDM Works
1. **Time Slots Allocation**: The core principle of TDM is to assign distinct time slots to each signal. During each time slot, the respective signal is transmitted, and then the process repeats cyclically. For example, if there are four signals to be transmitted, the TDM cycle will divide time into four slots, each assigned to one signal.
2. **Synchronization**: Synchronization is crucial in TDM systems to ensure that the signals are correctly aligned with their respective time slots. Both the transmitter and receiver must be synchronized to avoid any overlap or loss of data. This synchronization is typically maintained by a clock signal that ensures time slots are accurately aligned.
3. **Frame Structure**: TDM data is organized into frames, where each frame consists of a set of time slots. Each frame is repeated at regular intervals, ensuring continuous transmission of data streams. The frame structure helps in managing the data streams and maintaining the synchronization between the transmitter and receiver.
4. **Multiplexer and Demultiplexer**: At the transmitting end, a multiplexer combines multiple input signals into a single composite signal by assigning each signal to a specific time slot. At the receiving end, a demultiplexer separates the composite signal back into individual signals based on their respective time slots.
### Types of TDM
1. **Synchronous TDM**: In synchronous TDM, time slots are pre-assigned to each signal, regardless of whether the signal has data to transmit or not. This can lead to inefficiencies if some time slots remain empty due to the absence of data.
2. **Asynchronous TDM (or Statistical TDM)**: Asynchronous TDM addresses the inefficiencies of synchronous TDM by allocating time slots dynamically based on the presence of data. Time slots are assigned only when there is data to transmit, which optimizes the use of the communication channel.
### Applications of TDM
- **Telecommunications**: TDM is extensively used in telecommunication systems, such as in T1 and E1 lines, where multiple telephone calls are transmitted over a single line by assigning each call to a specific time slot.
- **Digital Audio and Video Broadcasting**: TDM is used in broadcasting systems to transmit multiple audio or video streams over a single channel, ensuring efficient use of bandwidth.
- **Computer Networks**: TDM is used in network protocols and systems to manage the transmission of data from multiple sources over a single network medium.
### Advantages of TDM
- **Efficient Use of Bandwidth**: TDM all
Using recycled concrete aggregates (RCA) for pavements is crucial to achieving sustainability. Implementing RCA for new pavement can minimize carbon footprint, conserve natural resources, reduce harmful emissions, and lower life cycle costs. Compared to natural aggregate (NA), RCA pavement has fewer comprehensive studies and sustainability assessments.
Design and Analysis of A 32-bit Pipelined MIPS Risc Processor
1. International Journal of VLSI design & Communication Systems (VLSICS) Vol 10, No 5, October 2019
DOI : 10.5121/vlsic.2019.10501 1
DESIGN AND ANALYSIS OF A 32-BIT PIPELINED
MIPS RISC PROCESSOR
P. Indira1
, M. Kamaraju2
and Ved Vyas Dwivedi3
1,3
Department of Electronics and Communication Engineering, CU Shah University,
Wadhwan, Gujarat, India
2
Department of Electronics and Communication Engineering,Gudlavalleru Engineering
College, JNT University, Kakinada, Andhra Pradesh, India
ABSTRACT
Pipelining is a technique that exploits parallelism, among the instructions in a sequential instruction stream
to get increased throughput, and it lessens the total time to complete the work. . The major objective of this
architecture is to design a low power high performance structure which fulfils all the requirements of the
design. The critical factors like power, frequency, area, propagation delay are analysed using Spartan 3E
XC3E 1600e device with Xilinx tool. In this paper, the 32-bit MIPS RISC processor is used in 6-stage
pipelining to optimize the critical performance factors. The fundamental functional blocks of the processor
include Input/Output blocks, configurable logic blocks, Block RAM, and Digital clock Manager and each
block permits to connect to multiple sources for the routing. The Auxiliary units enhance the performance of
the processor. The comparative study elevates the designed model in terms of Area, Power and Frequency.
MATLAB2D/3D graphs represents the relationship among various parameters of this pipelining. In this
pipeline model, it consumes very less power (0.129 W),path delay (11.180 ns) and low LUT utilization (421).
Similarly, the proposed model achieves better frequency increase (285.583 Mhz.), which obtained better
results compared to other models.
KEYWORDS
MATLAB, SPARTAN3E, MIPS RISC processor, Xilinx, Digital Clock Manager.
1. INTRODUCTION
Currently, the VLSI digital systems design overburdened with many complex features.
Multitasking, parallelism makes the system slow and consumes more power to meet these customer
requirements; and designers have to compromise with the critical factors [1].
One best way to overcome this problem is an implementation of the pipelining technique in VLSI
system design[2]. Pipelining is an implementation technique that several operations of instruction
are performed simultaneously to optimize speed, area and throughput of the work [3]. By applying
the instruction pipelining, decrease of power, delay, time and enhancement of speed occurs along
with the complete utilization of hardware.
The objective of this pipeline process is to minimize the power, increase the speed and to get all
benefits of high performance. For that, each element used in this design is power optimized. In
2. International Journal of VLSI design & Communication Systems (VLSICS) Vol 10, No 5, October 2019
2
addition to that, low power and high speed techniques enhance the results. Instead of 32-bit
architecture, 64-bit and 128-bit complexities of architecture can be tried for challenge.
The problem definition in this proposed architecture is to design a low power high speed pipeline
model to achieve less power and latency with low power high performance.
RISC processors are efficient in various ways compared to CISC processors as they consume less
power, execute faster as the number of instructions is less andhas simplified addressing modes with
simpler designs etc. [4–9].
By using the MIPS RISC processor, besides millions of instructions are performed concurrently
without interlocking, qualitatively many advanced desirable functions are also can be executed.
The Spartan 3E FPGAs are the successors of Spartan 3 family with advanced features containing
high volume, cost-sensitive, more logic per I/O units and updated programming features without
hardware replacement, which is impossible with ASICs designs.
The applications include broadband access, home networking, display/projection, and digital
television equipment and simply it is a superior alternative to mask-programmed ASICs.
In this Research implementation, MIPS RISC Spartan 3E family processoris used to evaluate the
various parameters through the pipeline to get optimum throughput. The Verilog HDL coding is
used to implement the instruction pipelining process on Xilinx platform.
The organization of this paper is as follows: Brief discussion about the low power importance,
processor details, scope of the paper, problem definition and objectives are reflected in section 1-
Introduction. Section 2 reviewsthe related works of different authors. Section 3 describes the
proposed methodology which contains stages of pipelining and main elements of the processor.
Section 4 explains the low power high speed techniques to enhance the results. Section 5 is about
Hazards and their remedies. Section 6 presents the simulation results of all six stages of pipelining
and their analysis. Section 7 describes about the software tools used and the related parameters are
narrated through 3D and 2D graphs. The comparative analysis is carried out in Section 8, where
frequency, power, LUT and Process technologyare compared. Finally, conclusion and future scope
comprehend the process resultsand the additional room of further work, respectively.
2. PROPOSED METHODOLOGY:
In this proposed methodology, 6 stages of pipelining process have been carried out.They are
Instruction Fetch Stage, Instruction Decode Stage, Register Read Stage, Memory Access Stage,
Data Memory Stage, and Write Back Stage. The instructions are executed, deliberately and
systematically going through all the stages.Each stage performs its pre-determined tasks and
contributes to the whole task. Registers between each stage helps in buffering during the pipeline
process. The time allotted for each stage is given as one clock cycle. If at all any mismatches occur
hazards may take place. Thecontrol unit produces NOP signals (stalls) to avoid flushing from
pipelining. The expected hazards are taken care by using different techniques with hardware and
software protection.
For minimizing the power, each element in pipeline process is carefully selected and implemented.
In addition to that, low power technique,i.e., power gating (fine grain method) is applied to all the
3. International Journal of VLSI design & Communication Systems (VLSICS) Vol 10, No 5, October 2019
3
devices in pipeline to minimize the power consumption. Similarly balancing pipelining reduces the
latency and increases the speed of the process.
The processor used in this architecture is of Spartan 3E family, XC3 1600e device with package
FG484 Verilog HDL programming language is used for coding to implement the pipeline process.
Xilinx software tool is used for simulation to get the Dynamic power, leakage power and total
power for different frequencies. Load/ Store instructions are used for read/write instructions which
is associated with data memory.
Figure 1. A Pipeline Data path
2.1. Stages of Pipelining
To gain the speed up and further ease of operations, the pipeline has more stages. All the
instructions in the pipeline follow the same sequence of simultaneous operations.
Figure 2. Flow chart of Pipelining stages
2.1.1. Instruction fetch (IF) Stage:
In this stage, the Opcode of the instruction is retrieved from the Instruction Memory (or)
Instruction Cache. There is a buffer unit attached at this stage to collect thenext five instruction
Opcodes to enhance the performance of the pipelining.
4. International Journal of VLSI design & Communication Systems (VLSICS) Vol 10, No 5, October 2019
4
2.1.2. Instruction Decode (ID) Stage:
In this phase, once the Opcode is determined by decoding operation, operand specifierspush this
stage to the operand searching. Generally operands are available in Register Bank (or) data
Memory.
2.1.3. Calculate Operands (CO) Stage:
The effective address of each operand is estimated in this stage. Depending on the Addressing
modes operands are being located. For indirect addressing mode, to locate the operand, it takes 2
clock cycle time. For Direct &Immediate addressing modes only one cycle time is enough to
capture the data.
2.1.4. Fetch Operands (FO) Stage:
Generally all operands reside in big Data Memory.To save the time & for ease of operation, Data
Cache is used. Operands may be also fetched from registers.
2.1.5. Execute Instruction (EI) Stage:
In this stage, actual indicated operationsare performed with Arithmetic Logic Unit. Obviously this
Unit consumes more power and needs high processing speed.
2.1.6. Write back (WB) Stage:
The results are transferred/stored in Data Cache/Data Memory and update the Registers with new
information with flag status.
2.2. Fundamental programmable functional elements:
Figure 3. Block Diagram of Processor Core
5. International Journal of VLSI design & Communication Systems (VLSICS) Vol 10, No 5, October 2019
5
2.2.1. Main components
2.2.1.1. Configurable Logic Blocks (CLB) :
TheCLBs are meant for performing logic functions and for data storage.
2.2.1.2. Input and output Blocks (IOBs) :
It manages the data flow between input-output pins and internal devices. There are four IOBs Top
(B0), Right (B1), Bottom (B2), and Left (B3). Each I/O permits input flow/ output flow together
with a three state bus buffer. It consists of unidirectional and bidirectional interfaces with FPGA
internal logic. The unidirectional input is the only block that has the sub-set of the full
IOB capabilities. Thus there are no connections or logic for an output path.
2.2.1.3. Block RAM:
This unit stores the data in single port and dual port blocks. It also consists of dual port
RAM conflicts and resolution block which resolves the data hazards and structural hazards.
2.2.1.4. Multiplier Blocks:
The multiplication can be performed here with 18 bit binary numbers.
2.2.1.5. Digital Clock Manager (DCM):
It is a “self-calibrating”system, which can perform functions such as delaying, multiplying,
dividing and phase shifting.
2.2.2. Auxiliary components
2.2.2.1. Package Marking:
The “quad flat packages” are used in this core processor.
2.2.2.2. Input Delay Option:
The “programmable delay block” delays the signal whenever it requires. This adjusts path delay
when input flip-flops are used with a global clock.
The delay values are assigned as follows:
IBUF_DELAY_VALUE
2.2.2.3. Storage Element Functions:
There are 3 pairs of storage elements (edge triggered D-flip-flops or a level sensitive latch) work
together with a special multiplexer to produce double data rate transmission.
There is a register cascaded feature which intends to simplify the operation and to enhance the
speed.
6. International Journal of VLSI design & Communication Systems (VLSICS) Vol 10, No 5, October 2019
6
2.2.2.4. Keeper Circuit:
It holds last logic level even though all drivers have been turned off. Pull-up Pull-down resistors
overwrite the keeper settings.
2.2.2.5. ESD Protection :
Electro-static Discharge (ESD) effect and voltage fluctuation damages can be eliminated by clamp
diodes to protect all the device pads in the system.
2.2.2.6. JTAG Boundary scan capabilities:
It allows the debug / emulation functions regardless of mode pin settings. Selecting JTAG mode
cancels the other modes to perform its operation.
The JTAG interface is easily cascaded to any number of FPGAs by connecting TDO output of one
device to the other TDI input of the next device in the chain.
2.2.2.7. Program boundary to third party support:
The main system boundary is extended to third party utilization for programs, data, etc. which are
connected through socket adopter.
2.2.2.8. Power Distribution System (PDS):
In this, PDS is designed nicely by bypass/ decoupling capacitors. The power on reset (POR) circuit
in PDS holds the reset state until the VccINT,VCCAUX,and VCCO Bank2 reach their respective threshold
levels.
2.2.2.9. No internal charge pumps:
It is a system protection feature. This feature allows FPGA to reject the analog noise when the
CCLK configuration clock is ON.
2.2.2.10. Production Stepping:
allows advanced features of stepping 1 amalgamated to the previous version of stepping 0 into it.
2.2.2.11. Simultaneous Switching Outputs:
This feature allows the maximum number of concurrently connected outputs to operate without
getting into the danger of switching noise.
2.3. Main components of pipeline model:
2.3.1. Low Power ALU [10]:
7. International Journal of VLSI design & Communication Systems (VLSICS) Vol 10, No 5, October 2019
7
The ALU performs all complex tasks, which need more power. A low leakage ALU is designed so that
wastage of power is controlled and overall power is reduced. For that, low leakage ALU model is
implemented in this pipeline.
2.3.2. Caches:
Instead of using data memory and instruction memory, data cache and instruction cache are used in pipeline
process. These small memories are effective in locating the operands and thereby it saves time. Most
frequently used & important data is readily utilized by caches in pipeline process than big memories.
2.3.3. Data forwarding Unit and Branch and Jump Prediction Unit [11]:
Data Hazard can be resolved by using Data forwarding unitwhich sends the result in advance. Usage of
forwarding unit is an efficient technique that predicts the data required and bypasses some of the stages. A 2
bit branch and jump prediction unit is used to store the destination address of the branch instruction and the
current branch status.
2.3.4. DDR4 SDRAM Controller [12]:
DDR4 SDRAM controller is a recent version after DDR3 SDRAM, which is widely used in PC, high-end
servers, smartphones, etc. It is basically an interface which bridges the gap between SDRAM devices and
Processor sub-system. The main advantage of DDR4 family is reduced power, parallel bank group, faster
burst access and better enablement for large capacity memory sub-systems. It has more advanced features:
CRC generator, Calibration Unit, Command generation Unit, etc.
2.3.5. Pipeline Registers [13]:
The pipeline registers consist of dual edge triggered implicit Flip-flops (DIFF_CGS). The main advantage of
using these types of Flip-flops in pipeline registers is to get the benefit of low power. Almost 10% of the
power is getting reduced by using these Flip-flops in registers.
3. LOW POWER AND HIGH SPEED TECHNIQUES:
3.1. Power Gating [14]:
Power gating is a well-known Technique used in this work. Whenever the system is in the inactive
state, the circuit gets turned-off automatically, by using this technique. This saves the leakage
power in standby mode. The low power units used in this work implicitly reduces the power to
provide their contribution in power minimization. Asynchronous blocks (especially ALU) are
connected to synchronous blocks with handshake signals and are used in this design to maintain
high speeds and utilize time optimally.
3.2. Deeper Pipeline process:
Pipeline technique is used for power saving, effective utilization of hardware&time and to get
maximum speed. Actually pipelining doesn’t reduce power by itself, it reduces the critical path
delay by inserting registers between the combinational logic. At the same time, speed will be
increased amd hence the total process time gets reduced.
For Deeper pipeline process, speed can be enhanced more effectively than normal pipelining. For a
6-stage pipeline process, suppose each stage is given 10 units of time. Each stage (task) is
8. International Journal of VLSI design & Communication Systems (VLSICS) Vol 10, No 5, October 2019
8
independent and they can be finished in their own time. For example, Instruction Decode stage may
take only two units to complete the task while the Execute stage may need 9 units to complete the
task. Giving equal time to each stage is not appropriate to obtain maximum speed. And hence,
deeper pipelining can be introduced to increase the speed. For that, two units of slots are allocated
to the entire pipeline. In that, IF stage uses just two slots (4 units) to complete the task, where as
Execute stage uses all 5 slots (9 units) to accomplish the task, which is shown in the Figure 4.
Figure 4. Deeper Pipelining
Before Deeper pipelining, total time taken to complete the taskTcyc = TIF + TID + TOF + TES +
TOSTcyc = 4 + 1 + 8 + 9 + 6 = 28sec.After Deeperpipeline:
Speed up = 28/2 = 14 sec.
By using Deeper pipeline method, the speed of the pipeline process can be enhanced by 14 times.
4. HAZARDS
Pipeline Hazards occur, where one instruction cannot immediately follow another in a concurrent instructions
execution. Hazards can always be resolved by waiting. Hazards limit the performance of the computers.
4.1. Structural Hazards:
This Hazard occurs, when two (or) more instructions need to utilize the same resource at atime.
Example:
1. One memory unit is used for instruction fetch and datafetch.
2. Since many floating point instructions require many cycles it is easy for them to interfere with
each other.
Dealing with structural hazards:
1. Releasing of Stalls:This is a low cost, simple, but increases clock cycles per instruction. Using
stalls should be avoided because stalling is a performance effect.
9. International Journal of VLSI design & Communication Systems (VLSICS) Vol 10, No 5, October 2019
9
2. Separate Hardware Resources must be allocated, as they are useful for multi-cycle instructions
with for good performance,however sometimes it is complex too.
3. Replicate the resources for good performance, however it increases the cost and may introduce
inter-connected delay.
4.2. Data Hazards:
Data hazards occur when data is used before it is ready. As shown in the figure below, the ADD
instruction releases its result after the execute stage. But that is needed by the SUB instruction
even before it is released. The solutions for a data hazards are stalling, forwarding, and reordering.
4.2.1. The Hazard Unit Releases the Stalls to Avoid the Flushing.
Figure5.Introducing Stalls
4.2.2. The Key Idea to Resolve the Data Hazard Through Forwarding the Data Directly to
Next Stage as Shown in the Figure
Figure6. Data Forward
Reordering:
It has been addressed only potential data hazards, where the forwarding unit is able to detect and
resolve them without affecting the performance of the pipeline. There are also unavoidable data
hazards which the forwarding unit cannot resolve. Either stalls or reordering of the instructions can
be done by the compiler. The compiler gives the preference to independent instructions to
introduce the delay.
10. International Journal of VLSI design & Communication Systems (VLSICS) Vol 10, No 5, October 2019
10
4.3. Control Hazards:
If the main program is branching towards a sub-program, it needs the return address in main
memory. Then only it jumps towards subroutine. The control hazard occurs if the instruction is
not specified in the return address.
Remedies:
- Stop loading instructions until the result is available.
- Assume an outcome and continue fetching but one may lose cycles if it is a mis-prediction.
-
Figure7. Control Hazard – Incorrect prediction
Figure8. Control Hazard – Correct prediction
For each branch encountered during execution, branch predictor predicts whether the branch will
be taken or not.
11. International Journal of VLSI design & Communication Systems (VLSICS) Vol 10, No 5, October 2019
11
Delayed branches:
The compiler coding is arranged in such a way that preference is given to independent instructions
prior to each branching so as to introduce delay.
5. RESULTS AND ANALYSIS:
5.1. Instruction Fetch(IF) Stage:
Figure9. RTL schematics of Fetch Stage
This is the first stage in the pipelining stages, where instruction memory is used to retrieve the
opcode of an instruction. The PC is then incremented to the next address fetch by PC +4.
Table 1. Power consumption of Fetch Stage
Frequency
(MHz)
Leakage
Power (W)
Dynamic
Power (W)
Total
Power
(W)
250 0.0203 0.0002 0.0205
500 0.0204 0,0005 0.0209
750 0.0205 0,0007 0.0212
1000 0.0206 0.0010 0.0216
5.2. Instruction Decode (ID) Stage:
Figure10. RTL Schematics of Decode Stage
In this stage a decode operation is performed by a decoder unit and it is generally attatched with the
register banks.
12. International Journal of VLSI design & Communication Systems (VLSICS) Vol 10, No 5, October 2019
12
Table 2. Power consumption of Decode Stage
Frequency
(MHz)
Leakage
Power (W)
Dynamic
Power (W)
Total
Power W)
250 0.0203 0.0001 0.0204
500 0.0203 0.0002 0.0205
750 0.0203 0.0003 0.0206
1000 0.0203 0.0003 0.0206
5.3. Register Read (RR) Stage:
Figure11. RTL schematics of Register Read Stage
In this stage, effective address of the operand is calculated using different addressing modes and
locating the required data in register bank or data memory.
Table 3. Power consumption of Register Read Stage
Frequency
(MHz)
Leakage
Power (W)
Dynamic
Power (W)
Total
Power
(W)
250 0.0204 0.0024 0.0228
500 0.0204 0.0048 0.0253
750 0.0205 0.0073 0.0278
1000 0.0206 0.0097 0.0302
5.4. Execute(EXE) Stage:
Figure 12. RTL schematics of Execute Stage
In this stage all types of instructions are executed. Depending on the addressing mode this stage
may take one or two clock cycles.
13. International Journal of VLSI design & Communication Systems (VLSICS) Vol 10, No 5, October 2019
13
Table 4. Power consumption of Execute Stage
Frequency
(MHz)
Leakage Power
(W)
Dynamic Power (W) Total
Power W)
250 0.0211 0.0025 0.0236
500 0.0217 0.0028 0.0245
750 0.0221 0.0031 0.0252
1000 0.0228 0.0038 0.0266
5.5. Memory –Access Stage:
In this stage load / store instructions are used especially to retrieve the data or to supply the data to
the data memory.
Either results or stored data is exchanged between ALU and data memory.
Figure 13. RTL Schematics of Memory Access stage
Table 5. Power consumption of Memory Access Stage
Frequency
(MHz)
Leakage
Power (W)
Dynamic
Power (W)
Total
Power (W)
250 0.0209 0.0006 0.0215
500 0.0203 0.0012 0.0216
750 0.0204 0.0018 0.0222
1000 0.0204 0.0025 0.0228
5.6. ‘Write_Back (WB) Stage:
Figure 14. Outputs of Write-back
The results obtained in the execution stage are necessarily stored in the register banks in this stage.
14. International Journal of VLSI design & Communication Systems (VLSICS) Vol 10, No 5, October 2019
14
Table 6. Power consumption of Write Back Stage
Frequency
(MHz)
Leakage Power
(W)
Dynamic Power
(W)
Total
Power (W)
250 0.0203 0.0004 0.0207
500 0.0203 0.0007 0.0211
750 0.0203 0.0011 0.0215
1000 0.0203 0.0015 0.0218
6. SOFTWARE TOOL AND RELATED REPRESENTATION
Spartan 3E applications can be processed using the Xilinx ISE 8.1i software, which also
implements critical bit stream generator updates.
Verilog HDL coding is usedto obtain the data in all the stages. MATLAB Tool is also supported in
connected to the parameter relations.
Figure 15. Time and delay summary report
Figure 15 shows the clock arrival time and finished time with maximum operating frequency and
path delay of the pipelining.
Figure 16. Frequency vs. Leakage Power & Dynamic Power Figure 17. Frequency vs Power
Spartan 3E family FPGA device XC3E1600E is used in this research to obtain the
informationabout the parameters in six stages of pipelining with package of FG484. The above
curve (Figure 16) represents the relationship with Dynamic power, Leakage power and Frequency
and Figure 17 represents the relationship between Frequency and Power.
15. International Journal of VLSI design & Communication Systems (VLSICS) Vol 10, No 5, October 2019
15
7. COMPARATIVE ANALYSIS
Table 7.Power comparison of various pipeline models
Parameters [15] Process
Technology
LUTs Frequency
(MHz)
Power
(W)
Spartan3: XC3S1500L-4FG676 90nm 417 98.090 0.144
Virtex5: XC5VFX30T –3FF665 65 nm 300 321.048 0.777
Virtex6: XC6VLX75T – 3FF784 40 nm 307 401.881 1.440
Virtex6Low Power:
XC6VLX75TL–IL-FF784
40 nm 307 335.233 0.920
Proposed Model 90nm 421 285.583 0.129
Figure18. Performance Comparison of various process cores
In device comparison graph, comparison is done with the different devices with different
parameters such as Process Technology, LUTs, Frequency and Power. The proposed model,
(Spartan3E) consumes less power 0.129 W when compared to other device powers.
Table 8. Frequency and LUT comparison of various pipeline models
Parameters Proposed
Model
GPPM[1
6]
Low Power
MIPS [17]
MIPS
Core [18]
Tiny
CPU [19]
Max.
Frequency
(MHz)
285.583 277.9 205.7 95.5 89
LUT 421 1168 1890 2340 336
In Frequency comparative analysis, proposed model has obtained the maximum operating
frequency (285.583MHz) with the LUTs utilized is 421.When compared to the other
models,proposed model gains better results in terms of increased speed and less utilization of
slices.
0
200
400
600
800
1000
1200
1400
1600
Spartan3 Virtex5 Virtex6 Virtex6LP Our Model
Process Technology (nm) LUTs Frequency (MHz) Power (mW)
16. International Journal of VLSI design & Communication Systems (VLSICS) Vol 10, No 5, October 2019
16
Figure19. Frequency Comparison
8. CONCLUSION
In this research, Spartan 3E family device of MIPS RISC Processor is employed to implement the
6-stage pipelining. Low power high speed techniques were employed to reduce the power and to
increase the speed. The proposed model utilizes a total power of 0.129W,which is less than its
counterparts. Similarly, the speed of this process is 285.583 MHzwhich is 22% higher than GPPM
model, which has the same Spartan 3E device. Moreover, the utilization of number of elements is
also less, there by optimizing the area with this design.
Various hazards are analyzed, for which possible remedies are also presented. The dynamic power,
leakage power and total power are measured at various frequencies. Various parameters relating to
this pipeline process are shown through 3D and 2D graphs by using MATLAB tool. The time
taken to complete the whole pipeline processwith propagation delay is also noted, which is in the
order of Nano seconds. Verilog HDL coding with Xilinx software tool is used to obtain the
simulation results.
In comparison to various pipelining models,the proposedmodel obtains the best results in terms of
power, speed, and area utilization.
9. FUTURE SCOPE
The proposed pipeline model uses Spartan 3E Processor with normal pipeline process. Whereas,
Virtex 7 processors can use super scalar pipeline which is faster than normal pipelining and super
pipelining. In this work, power gating technique is used to minimize the power. Some other power
minimization techniques can be combined with this, to further reduce the power and to increase the
speed. Spartan 7 processor has more advanced features than Spartan 3E and can be used with more
number of pipeline stages, which eases the process.
0
50
100
150
200
250
300
Our Model GPPM Low Power
MIPS
MIPS Core Tiny CPU
Frequency,
MHz
17. International Journal of VLSI design & Communication Systems (VLSICS) Vol 10, No 5, October 2019
17
REFERENCES
[1] Rashid F. Olanrewaju, Fawwaj E Fajingbesi, S.B. Junaid, Ridzwan Alahudin, Farhat Anwar & Bisma
Rasool Pampori (2017) “Design and Implementation of a Five Stage Pipelining Architecture Simulator
for RiSC-16 Instruction Set”, Indian Journal of Science and Technology, Vol. 10, No. 3, pp 1-9.
[2] Vijaykumar J, Nagaraju B, Swapna C & Ramanujappa T (2014 April) “Design and Development of
FGPA based Low Power pipelined 64-bit RISC processor with Double precession Floating point Unit”,
International Conference on Communication and Signal processing.
[3] Saranya Krishnamurthy, Ramani Kannan, Erman Azwan Yahya & Kishore Bingi (2017) “ Design of
FIR Filter using Novel pipelined Bypass Multiplier”,IEEE 3rd International Symposium on Robotics
and Manufacturing Automation, pp1-6.
[4] Sneha Mangalwedhe, Roopa Kulkarni & S. Y. Kulkarni (2017) “Low Power Implementation of 32-bit
RISC Processor with pipelining. 2nd International Conference on Microelectronics”, Computing &
Communication Systems (MCCS-2017), at Bangalore.
[5] Husainali S Bhimani, Hitesh N. Patel & Abhishek A Davda (2016) “Design of 32-bit 3-stage pipelined
processor based on MIPS in Verilog HDL and implementation on FPGA Virtex7”,International Journal
of Applied Information Systems, Vol. 10, No. 9.
[6] Rakesh M.R. (2014 April) “RISC Processor Design in VLSI Technology Using the Pipeline
Technique”, International journal of innovative research in Electrical, Electronics, Instrumentation and
control Engineering, Vol. 2, No. 4.
[7] Indu M& Arun Kumar M. (2013 August) “Design of Low Power Pipelined RISC
Processor”,International Journal of Advanced Research in Electrical Electronics and Instrumentation
Engineering, Vol. 2, No. 8.
[8] Priyanka Trivedi &Rajan Prasad Tripathi (2015) “Design & Analysis of 16 bit RISC Processor Using
Low Power Pipelining”,International Conference on Computing, Communication and Automation, pp
1294-1297.
[9] Charu Sharma & Gurupreet Singh Saini (2017 June) “Design and Analysis of High Performance RISC
Processor using Hyperpipelining Technique”,IJASRE, Vol. 3, No. 5, pp 200-206.
[10] Meera S & Umamaheshwari D “Genetic Algorithm for Leakage Reduction through IVC using Verilog”
International Journal of Microelectronics Engineering, Vol. 1, No. 1, pp 51-62.
[11] Zulkifli.M, Yudhanto.Y.P , Soetharyo N.A, and Adiono.T (2009, August), “Reduced Stall MIPS
Architecture using Pre-Fetching Accelerator”,International Conference onElectrical Engineering and
Informatics, IEEE.
[12] Md. Ashraful Islam, Md. Yeasin Arafath, Md. Jahid Hasan (2014, December) “Design of DDR4
SDRAM Controller”,8th
International Conference on Electrical and Computer Engineering, Dhaka,
Bangladesh.
[13] Liang Geng, Ji-zhong Shen, Cong-yuan Xu (2016) “Power-efficient dual-edge implicit pulse-triggered
flip-flop with an embedded clock-gating scheme”, Frontiers of Information Technology and Electrical
Engineering,Vol. 17, No. 9. PP 962-972.
18. International Journal of VLSI design & Communication Systems (VLSICS) Vol 10, No 5, October 2019
18
[14] Aruljothi K, Prajitha PB & Rajaprabha R (2014) “Leakage Power reduction using Power gating and
Multi-vt technique”, International Journal of Advanced research in Computer Engineering and
Technology, Vol. 3, No. 1.
[15] Narender Kumar & Munish Rattan. (2015 December) “Implementation of Embedded RISC processor
with Dynamic Power Management for Low-Power Embedded system on SOC”, IEEE Proceedings of
2015 RAECS.
[16] Nishant Kumar & Ekta Aggrawal (2013, September), “General Purpose Six-Stage Pipelined
Processor”,International Journal of Scientific & Engineering Research, Vol. 4, No.9.
[17] Mamum Bin IbneReaz, Shabiul Islam & Mohd. S. Sulaiman (2002 December) “A single Clock Cycle
MIPS RISC Processor Design using VHDL”,In proceedings of IEEE International Conference on
Semiconductor Electronics, Penang, Malaysia, pp 199-203.
[18] Gautham P, Parthasarathy R. & KarthiBalasubramanian (2009 December) “Low Power Pipelined MIPS
Processor Design”, In proceedings of IEEE International Conference on Integrated circuits, pp 462-
465.
[19] Koji Nakano, Kensuke Kawakami, Koji Shigemoto, Yuki Kamada & Yasuaki Ito (2008) “A Tiny
Processing System for Education and Small embedded Systems on the FPGAs”, In IEEE International
Conference on Embedded and Ubiquitous Computing.
AUTHORS
PONUGUMATLA INDIRA,
M. Tech., MBA, M.Sc. (Psych), (Ph.D.).
Assistant Professor, GITAM University, Hyderabad
DR. M. KAMARAJU,
M. Tech., Ph.D.
Professor, ECE Dept.,
Gudlavalleru Engineering College, JNTUK,
Krishna District, Andhra Pradesh, India
DR. VED VYAS DWIVEDI,
M. Tech., Ph.D.
Professor, ECE Dept.
Pro-Vice Chancellor, CU Shah University,
Wadhwan – Gujarat, India