Pipelining is a technique that exploits parallelism, among the instructions in a sequential instruction stream
to get increased throughput, and it lessens the total time to complete the work. . The major objective of this
architecture is to design a low power high performance structure which fulfils all the requirements of the
design. The critical factors like power, frequency, area, propagation delay are analysed using Spartan 3E
XC3E 1600e device with Xilinx tool.
An octa core processor with shared memory and message-passingeSAT Journals
Abstract This being the era of fast, high performance computing, there is the need of having efficient optimizations in the processor architecture and at the same time in memory hierarchy too. Each and every day, the advancement of applications in communication and multimedia systems are compelling to increase number of cores in the main processor viz., dual-core, quad-core, octa-core and so on. But, for enhancing the overall performance of multi processor chip, there are stringent requirements to improve inter-core synchronization. Thus, a MPSoC with 8-cores supporting both message-passing and shared-memory inter-core communication mechanisms is implemented on Virtex 5 LX110T FPGA. Each core is based on MIPS III (Microprocessor without interlocked pipelined stages) ISA, handling only integer type instructions and having six-stage pipeline with data hazard detection unit and forwarding logic. The eight processing cores and one central shared memory core are inter connected using 3x3 2-D mesh topology based Network-on-chip (NoC) with virtual channel router. The router is four stage pipelined supporting DOR X-Y routing algorithm and with round robin arbitration technique. For verification and functionality test of above fully synthesized multi core processor, matrix multiplication operation is mapped onto the above said. Partitioning and scheduling of multiple multiplications and addition for each element of resultant matrix has been done accordingly among eight cores to get maximum throughput. All the codes for processor design are written in Verilog HDL. Keywords: MPSoC, message-passing, shared memory, MIPS, ISA, wormhole router, network-on-chip, SIMD, data level parallelism, 2-D Mesh, virtual channel
Design & Simulation of RISC Processor using Hyper Pipelining TechniqueIOSR Journals
This Hyper pipelining technique is different to the pipelining of instruction decoding known from
RISC processors. The point is that we can use hyper pipelining on top of any sequential logic, for example a
RISC processor, independent of its underlying functionality. The RISC processor with pipelined instruction set
decoding can automatically be hyper pipelined to generate CMF individual RISC processors. Hyper pipelining
implements additional register and can use register balancing for fine grain timing optimizations. The method
hyper pipelining is also called “C-slow Retiming”. The main benefit is the multiplication of the core's
functionality by only implementing registers. This is a great advantage for ASICs but obviously very attractive
for FPGAs with their already existing registers
LOGIC OPTIMIZATION USING TECHNOLOGY INDEPENDENT MUX BASED ADDERS IN FPGAVLSICS Design
Adders form an almost obligatory component of every contemporary integrated circuit. The prerequisite of the adder is that it is primarily fast and secondarily efficient in terms of power consumption and chip area. Therefore, careful optimization of the adder is of the greatest importance. This optimization can be attained
in two levels; it can be circuit or logic optimization. In circuit optimization the size of transistors are manipulated, where as in logic optimization the Boolean equations are rearranged (or manipulated) to optimize speed, area and power consumption. This paper focuses the optimization of adder through technology independent mapping. The work presents 20 different logical construction of 1-bit adder cell in CMOS logic and its performance is analyzed in terms of transistor count, delay and power dissipation. These performance issues are analyzed through Tanner EDA with TSMC MOSIS 250nm technology. From this analysis the optimized equation is chosen to construct a full adder circuit in terms of multiplexer. This logic optimized multiplexer based adders are incorporated in selected existing adders like ripple carry
adder, carry look-ahead adder, carry skip adder, carry select adder, carry increment adder and carry save adder and its performance is analyzed in terms of area (slices used) and maximum combinational path delay as a function of size. The target FPGA device chosen for the implementation of these adders was Xilinx ISE 12.1 Spartan3E XC3S500-5FG320. Each adder type was implemented with bit sizes of: 8, 16, 32, 64 bits. This variety of sizes will provide with more insight about the performance of each adder in terms of area and delay as a function of size.
Hardback solution to accelerate multimedia computation through mgp in cmpeSAT Publishing House
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
INCREASING THE THROUGHPUT USING EIGHT STAGE PIPELININGijiert bestjournal
Using re-programmable logic components along with HDL languages encompasses wider and wider areas of practical applications,becoming a standard of complex digital sys tem design. One of the basic tasks,which are to be carried out in the process of design,is obtaining the highest efficiency of the solution under design. Thereby designers are still looking for methods making it possible to speed up design processing time. Pipelining mechanism is one of these methods. It helps to speed up som e dedicated operations. In the early stage of design,a given unit described by high level language,is divided into some independent parts,which are synchronized with each other via intermediate registers and synchr onization signal (pipelining mechanism). 8- stage pipelining is the key implementation technique used to make f ast CPUs. It is an optimization technique used to speed up instruction execution. Throughput of an instruction pipelin e is increased while latency is decreased for each instruction execution. This new 8-stage pipel ining includes two instruction fetch,one instruction decode,two execution,two memory and one write back stag es. It describes advantages of both speed and suitability for synthesizable RISC design.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
An octa core processor with shared memory and message-passingeSAT Journals
Abstract This being the era of fast, high performance computing, there is the need of having efficient optimizations in the processor architecture and at the same time in memory hierarchy too. Each and every day, the advancement of applications in communication and multimedia systems are compelling to increase number of cores in the main processor viz., dual-core, quad-core, octa-core and so on. But, for enhancing the overall performance of multi processor chip, there are stringent requirements to improve inter-core synchronization. Thus, a MPSoC with 8-cores supporting both message-passing and shared-memory inter-core communication mechanisms is implemented on Virtex 5 LX110T FPGA. Each core is based on MIPS III (Microprocessor without interlocked pipelined stages) ISA, handling only integer type instructions and having six-stage pipeline with data hazard detection unit and forwarding logic. The eight processing cores and one central shared memory core are inter connected using 3x3 2-D mesh topology based Network-on-chip (NoC) with virtual channel router. The router is four stage pipelined supporting DOR X-Y routing algorithm and with round robin arbitration technique. For verification and functionality test of above fully synthesized multi core processor, matrix multiplication operation is mapped onto the above said. Partitioning and scheduling of multiple multiplications and addition for each element of resultant matrix has been done accordingly among eight cores to get maximum throughput. All the codes for processor design are written in Verilog HDL. Keywords: MPSoC, message-passing, shared memory, MIPS, ISA, wormhole router, network-on-chip, SIMD, data level parallelism, 2-D Mesh, virtual channel
Design & Simulation of RISC Processor using Hyper Pipelining TechniqueIOSR Journals
This Hyper pipelining technique is different to the pipelining of instruction decoding known from
RISC processors. The point is that we can use hyper pipelining on top of any sequential logic, for example a
RISC processor, independent of its underlying functionality. The RISC processor with pipelined instruction set
decoding can automatically be hyper pipelined to generate CMF individual RISC processors. Hyper pipelining
implements additional register and can use register balancing for fine grain timing optimizations. The method
hyper pipelining is also called “C-slow Retiming”. The main benefit is the multiplication of the core's
functionality by only implementing registers. This is a great advantage for ASICs but obviously very attractive
for FPGAs with their already existing registers
LOGIC OPTIMIZATION USING TECHNOLOGY INDEPENDENT MUX BASED ADDERS IN FPGAVLSICS Design
Adders form an almost obligatory component of every contemporary integrated circuit. The prerequisite of the adder is that it is primarily fast and secondarily efficient in terms of power consumption and chip area. Therefore, careful optimization of the adder is of the greatest importance. This optimization can be attained
in two levels; it can be circuit or logic optimization. In circuit optimization the size of transistors are manipulated, where as in logic optimization the Boolean equations are rearranged (or manipulated) to optimize speed, area and power consumption. This paper focuses the optimization of adder through technology independent mapping. The work presents 20 different logical construction of 1-bit adder cell in CMOS logic and its performance is analyzed in terms of transistor count, delay and power dissipation. These performance issues are analyzed through Tanner EDA with TSMC MOSIS 250nm technology. From this analysis the optimized equation is chosen to construct a full adder circuit in terms of multiplexer. This logic optimized multiplexer based adders are incorporated in selected existing adders like ripple carry
adder, carry look-ahead adder, carry skip adder, carry select adder, carry increment adder and carry save adder and its performance is analyzed in terms of area (slices used) and maximum combinational path delay as a function of size. The target FPGA device chosen for the implementation of these adders was Xilinx ISE 12.1 Spartan3E XC3S500-5FG320. Each adder type was implemented with bit sizes of: 8, 16, 32, 64 bits. This variety of sizes will provide with more insight about the performance of each adder in terms of area and delay as a function of size.
Hardback solution to accelerate multimedia computation through mgp in cmpeSAT Publishing House
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
INCREASING THE THROUGHPUT USING EIGHT STAGE PIPELININGijiert bestjournal
Using re-programmable logic components along with HDL languages encompasses wider and wider areas of practical applications,becoming a standard of complex digital sys tem design. One of the basic tasks,which are to be carried out in the process of design,is obtaining the highest efficiency of the solution under design. Thereby designers are still looking for methods making it possible to speed up design processing time. Pipelining mechanism is one of these methods. It helps to speed up som e dedicated operations. In the early stage of design,a given unit described by high level language,is divided into some independent parts,which are synchronized with each other via intermediate registers and synchr onization signal (pipelining mechanism). 8- stage pipelining is the key implementation technique used to make f ast CPUs. It is an optimization technique used to speed up instruction execution. Throughput of an instruction pipelin e is increased while latency is decreased for each instruction execution. This new 8-stage pipel ining includes two instruction fetch,one instruction decode,two execution,two memory and one write back stag es. It describes advantages of both speed and suitability for synthesizable RISC design.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
A NETWORK-BASED DAC OPTIMIZATION PROTOTYPE SOFTWARE 2 (1).pdfSaiReddy794166
The International Journal of Engineering and Science and Research is online journal in English published. The aim is to publish peer review and research articles without delay in the developing in engineering and science Research.The International Journal of Engineering and Science and Research is online journal in English published. The aim is to publish peer review and research articles without delay in the developing in engineering and science Research.
STUDY OF VARIOUS FACTORS AFFECTING PERFORMANCE OF MULTI-CORE PROCESSORSijdpsjournal
Advances in Integrated Circuit processing allow for more microprocessor design options. As Chip Multiprocessor system (CMP) become the predominant topology for leading microprocessors, critical components of the system are now integrated on a single chip. This enables sharing of computation resources that was not previously possible. In addition the virtualization of these computation resources exposes the system to a mix of diverse and competing workloads. On chip Cache memory is a resource of primary concern as it can be dominant in controlling overall throughput. This Paper presents analysis of various parameters affecting the performance of Multi-core Architectures like varying the number of cores, changes L2 cache size, further we have varied directory size from 64 to 2048 entries on a 4 node, 8 node 16 node and 64 node Chip multiprocessor which in turn presents an open area of research on multicore processors with private/shared last level cache as the future trend seems to be towards tiled architecture executing multiple parallel applications with optimized silicon area utilization and excellent performance.
A Unique Test Bench for Various System-on-a-Chip IJECEIAES
This paper discusses a standard flow on how an automated test bench environment which is randomized with constraints can verify a SOC efficiently for its functionality and coverage. Today, in the time of multimillion gate ASICs, reusable intellectual property (IP), and system-ona-chip (SoC) designs, verification consumes about 70 % of the design effort. Automation means a machine completes a task autonomously, quicker and with predictable results. Automation requires standard processes with welldefined inputs and outputs. By using this efficient methodology it is possible to provide a general purpose automation solution for verification, given today’s technology. Tools automating various portions of the verification process are being introduced. Here, we have Communication based SOC The content of the paper discusses about the methodology used to verify such a SOC-based environment. Cadence Efficient Verification Methodology libraries are explored for the solution of this problem. We can take this as a state of art approach in verifying SOC environments. The goal of this paper is to emphasize the unique testbench for different SOC using Efficient Verification Constructs implemented in system verilog for SOC verification.
Implementing True Zero Cycle Branching in Scalar and Superscalar Pipelined Pr...IDES Editor
In this paper, we have proposed a novel architectural
technique which can be used to boost performance of modern
day processors. It is especially useful in certain code constructs
like small loops and try-catch blocks. The technique is aimed
at improving performance by reducing the number of
instructions that need to enter the pipeline itself. We also
demonstrate its working in a scalar pipelined soft-core
processor developed by us. Lastly, we present how a superscalar
microprocessor can take advantage of this technique and
increase its performance.
International Journal of Engineering and Science Invention (IJESI) is an international journal intended for professionals and researchers in all fields of computer science and electronics. IJESI publishes research articles and reviews within the whole field Engineering Science and Technology, new teaching methods, assessment, validation and the impact of new technologies and it will continue to provide information on the latest trends and developments in this ever-expanding subject. The publications of papers are selected through double peer reviewed to ensure originality, relevance, and readability. The articles published in our journal can be accessed online.
The proposed system is an efficient processing of 16-bit Multiplier Accumulator using Radix-8 and Radix-16 modified Booth Algorithm and other adders (SPST adder, Carry select adder, Parallel Prefix adder) using VHDL (Very High Speed Integrated Circuit Hardware Description Language). This proposed system provides low power, high speed and fewer delays. In both booth multipliers, comparison between the power consumption (mw) and estimated delay (ns) are calculated. The application of digital signal processing like fast fourier transform, finite impulse response and convolution needs high speed and low power MAC (Multiplier and Accumulator) units to construct an added. By reducing the glitches (from 1 to 0 transition) and spikes (from 0 to 1 transition), the speed of operation is improved and dynamic power is reduced. The adder designed with SPST avoids the unwanted glitches and spikes, reduce the switching power dissipation and the dynamic power. The speed can be improved by reducing the number of partial products to half, by grouping of bits in the multiplier term. The proposed Radix-8 and Radix-16 Modified Booth Algorithm MAC with SPST reduces the delay and obtain low power consumption as compared to array MAC.
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
Design And Verification of AMBA APB ProtocolIJERA Editor
Advanced microcontroller bus architecture (AMBA) is a well established open specification for the proper management of functional blocks comprising system-on-chips (SOCs). A Memory Controller is designed to cater to this problem. This design presents an intellectual property (IP) for inter-Advanced peripheral bus (APB) protocol. The Memory Controller is a digital circuit which manages the flow of data going to and from the main memory. It can be a separate chip or can be integrated into the system chipset. This paper revolves around building an Advanced Microcontroller Bus Architecture (AMBA) compliant Memory Controller as an Advanced High-performance Bus (AHB) slave. The work involved is of APB Protocol and its slave Verification. The whole design is captured using VHDL, simulated with ModelSim and configured to a FPGA target device belonging to the Virtex4 family using Xilinx.
A ULTRA-LOW POWER ROUTER DESIGN FOR NETWORK ON CHIPijaceeejournal
The design of more complex systems becomes an increasingly difficult task because of different issues
related to latency, design reuse, throughput and cost that has to be considered while designing. In
Real-time applications there are different communication needs among the cores. When NoCs (Networks
on chip) are the means to interconnect the cores, use of some techniques to optimize the
communication are indispensable. From the performance point of view, large buffer sizes ensure
performance during different applications execution. But unfortunately, these same buffers are the main
responsible for the router total power dissipation. Another aspect is that by sizing buffers for the worst case
latency incurs in extra dissipation for the mean case, which is much more frequent. Reconfigurable router
architecture for NOC is designed for processing elements communicate over a second
communication level using direct-links between another node elements. Several possibilities to use the
router as additional resources to enhance complexity of modules are presented. The reconfigurable router
is evaluated in terms of area, speed and latencies. The proposed router was described in VHDL and used
the ModelSim tool to simulate the code. Analyses the average power consumption, area, and frequency
results to a standard cell library using the Design Compiler tool. With the reconfigurable router it was
possible to reduce the congestion in the network, while at the same time reducing power dissipation and
improving energy.
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionAggregage
Join Maher Hanafi, VP of Engineering at Betterworks, in this new session where he'll share a practical framework to transform Gen AI prototypes into impactful products! He'll delve into the complexities of data collection and management, model selection and optimization, and ensuring security, scalability, and responsible use.
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
More Related Content
Similar to DESIGN AND ANALYSIS OF A 32-BIT PIPELINED MIPS RISC PROCESSOR
A NETWORK-BASED DAC OPTIMIZATION PROTOTYPE SOFTWARE 2 (1).pdfSaiReddy794166
The International Journal of Engineering and Science and Research is online journal in English published. The aim is to publish peer review and research articles without delay in the developing in engineering and science Research.The International Journal of Engineering and Science and Research is online journal in English published. The aim is to publish peer review and research articles without delay in the developing in engineering and science Research.
STUDY OF VARIOUS FACTORS AFFECTING PERFORMANCE OF MULTI-CORE PROCESSORSijdpsjournal
Advances in Integrated Circuit processing allow for more microprocessor design options. As Chip Multiprocessor system (CMP) become the predominant topology for leading microprocessors, critical components of the system are now integrated on a single chip. This enables sharing of computation resources that was not previously possible. In addition the virtualization of these computation resources exposes the system to a mix of diverse and competing workloads. On chip Cache memory is a resource of primary concern as it can be dominant in controlling overall throughput. This Paper presents analysis of various parameters affecting the performance of Multi-core Architectures like varying the number of cores, changes L2 cache size, further we have varied directory size from 64 to 2048 entries on a 4 node, 8 node 16 node and 64 node Chip multiprocessor which in turn presents an open area of research on multicore processors with private/shared last level cache as the future trend seems to be towards tiled architecture executing multiple parallel applications with optimized silicon area utilization and excellent performance.
A Unique Test Bench for Various System-on-a-Chip IJECEIAES
This paper discusses a standard flow on how an automated test bench environment which is randomized with constraints can verify a SOC efficiently for its functionality and coverage. Today, in the time of multimillion gate ASICs, reusable intellectual property (IP), and system-ona-chip (SoC) designs, verification consumes about 70 % of the design effort. Automation means a machine completes a task autonomously, quicker and with predictable results. Automation requires standard processes with welldefined inputs and outputs. By using this efficient methodology it is possible to provide a general purpose automation solution for verification, given today’s technology. Tools automating various portions of the verification process are being introduced. Here, we have Communication based SOC The content of the paper discusses about the methodology used to verify such a SOC-based environment. Cadence Efficient Verification Methodology libraries are explored for the solution of this problem. We can take this as a state of art approach in verifying SOC environments. The goal of this paper is to emphasize the unique testbench for different SOC using Efficient Verification Constructs implemented in system verilog for SOC verification.
Implementing True Zero Cycle Branching in Scalar and Superscalar Pipelined Pr...IDES Editor
In this paper, we have proposed a novel architectural
technique which can be used to boost performance of modern
day processors. It is especially useful in certain code constructs
like small loops and try-catch blocks. The technique is aimed
at improving performance by reducing the number of
instructions that need to enter the pipeline itself. We also
demonstrate its working in a scalar pipelined soft-core
processor developed by us. Lastly, we present how a superscalar
microprocessor can take advantage of this technique and
increase its performance.
International Journal of Engineering and Science Invention (IJESI) is an international journal intended for professionals and researchers in all fields of computer science and electronics. IJESI publishes research articles and reviews within the whole field Engineering Science and Technology, new teaching methods, assessment, validation and the impact of new technologies and it will continue to provide information on the latest trends and developments in this ever-expanding subject. The publications of papers are selected through double peer reviewed to ensure originality, relevance, and readability. The articles published in our journal can be accessed online.
The proposed system is an efficient processing of 16-bit Multiplier Accumulator using Radix-8 and Radix-16 modified Booth Algorithm and other adders (SPST adder, Carry select adder, Parallel Prefix adder) using VHDL (Very High Speed Integrated Circuit Hardware Description Language). This proposed system provides low power, high speed and fewer delays. In both booth multipliers, comparison between the power consumption (mw) and estimated delay (ns) are calculated. The application of digital signal processing like fast fourier transform, finite impulse response and convolution needs high speed and low power MAC (Multiplier and Accumulator) units to construct an added. By reducing the glitches (from 1 to 0 transition) and spikes (from 0 to 1 transition), the speed of operation is improved and dynamic power is reduced. The adder designed with SPST avoids the unwanted glitches and spikes, reduce the switching power dissipation and the dynamic power. The speed can be improved by reducing the number of partial products to half, by grouping of bits in the multiplier term. The proposed Radix-8 and Radix-16 Modified Booth Algorithm MAC with SPST reduces the delay and obtain low power consumption as compared to array MAC.
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
Design And Verification of AMBA APB ProtocolIJERA Editor
Advanced microcontroller bus architecture (AMBA) is a well established open specification for the proper management of functional blocks comprising system-on-chips (SOCs). A Memory Controller is designed to cater to this problem. This design presents an intellectual property (IP) for inter-Advanced peripheral bus (APB) protocol. The Memory Controller is a digital circuit which manages the flow of data going to and from the main memory. It can be a separate chip or can be integrated into the system chipset. This paper revolves around building an Advanced Microcontroller Bus Architecture (AMBA) compliant Memory Controller as an Advanced High-performance Bus (AHB) slave. The work involved is of APB Protocol and its slave Verification. The whole design is captured using VHDL, simulated with ModelSim and configured to a FPGA target device belonging to the Virtex4 family using Xilinx.
A ULTRA-LOW POWER ROUTER DESIGN FOR NETWORK ON CHIPijaceeejournal
The design of more complex systems becomes an increasingly difficult task because of different issues
related to latency, design reuse, throughput and cost that has to be considered while designing. In
Real-time applications there are different communication needs among the cores. When NoCs (Networks
on chip) are the means to interconnect the cores, use of some techniques to optimize the
communication are indispensable. From the performance point of view, large buffer sizes ensure
performance during different applications execution. But unfortunately, these same buffers are the main
responsible for the router total power dissipation. Another aspect is that by sizing buffers for the worst case
latency incurs in extra dissipation for the mean case, which is much more frequent. Reconfigurable router
architecture for NOC is designed for processing elements communicate over a second
communication level using direct-links between another node elements. Several possibilities to use the
router as additional resources to enhance complexity of modules are presented. The reconfigurable router
is evaluated in terms of area, speed and latencies. The proposed router was described in VHDL and used
the ModelSim tool to simulate the code. Analyses the average power consumption, area, and frequency
results to a standard cell library using the Design Compiler tool. With the reconfigurable router it was
possible to reduce the congestion in the network, while at the same time reducing power dissipation and
improving energy.
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionAggregage
Join Maher Hanafi, VP of Engineering at Betterworks, in this new session where he'll share a practical framework to transform Gen AI prototypes into impactful products! He'll delve into the complexities of data collection and management, model selection and optimization, and ensuring security, scalability, and responsible use.
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfPeter Spielvogel
Building better applications for business users with SAP Fiori.
• What is SAP Fiori and why it matters to you
• How a better user experience drives measurable business benefits
• How to get started with SAP Fiori today
• How SAP Fiori elements accelerates application development
• How SAP Build Code includes SAP Fiori tools and other generative artificial intelligence capabilities
• How SAP Fiori paves the way for using AI in SAP apps
Removing Uninteresting Bytes in Software FuzzingAftab Hussain
Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process.
In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds.
- These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofsAlex Pruden
This paper presents Reef, a system for generating publicly verifiable succinct non-interactive zero-knowledge proofs that a committed document matches or does not match a regular expression. We describe applications such as proving the strength of passwords, the provenance of email despite redactions, the validity of oblivious DNS queries, and the existence of mutations in DNA. Reef supports the Perl Compatible Regular Expression syntax, including wildcards, alternation, ranges, capture groups, Kleene star, negations, and lookarounds. Reef introduces a new type of automata, Skipping Alternating Finite Automata (SAFA), that skips irrelevant parts of a document when producing proofs without undermining soundness, and instantiates SAFA with a lookup argument. Our experimental evaluation confirms that Reef can generate proofs for documents with 32M characters; the proofs are small and cheap to verify (under a second).
Paper: https://eprint.iacr.org/2023/1886
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
In his public lecture, Christian Timmerer provides insights into the fascinating history of video streaming, starting from its humble beginnings before YouTube to the groundbreaking technologies that now dominate platforms like Netflix and ORF ON. Timmerer also presents provocative contributions of his own that have significantly influenced the industry. He concludes by looking at future challenges and invites the audience to join in a discussion.
Welcome to the first live UiPath Community Day Dubai! Join us for this unique occasion to meet our local and global UiPath Community and leaders. You will get a full view of the MEA region's automation landscape and the AI Powered automation technology capabilities of UiPath. Also, hosted by our local partners Marc Ellis, you will enjoy a half-day packed with industry insights and automation peers networking.
📕 Curious on our agenda? Wait no more!
10:00 Welcome note - UiPath Community in Dubai
Lovely Sinha, UiPath Community Chapter Leader, UiPath MVPx3, Hyper-automation Consultant, First Abu Dhabi Bank
10:20 A UiPath cross-region MEA overview
Ashraf El Zarka, VP and Managing Director MEA, UiPath
10:35: Customer Success Journey
Deepthi Deepak, Head of Intelligent Automation CoE, First Abu Dhabi Bank
11:15 The UiPath approach to GenAI with our three principles: improve accuracy, supercharge productivity, and automate more
Boris Krumrey, Global VP, Automation Innovation, UiPath
12:15 To discover how Marc Ellis leverages tech-driven solutions in recruitment and managed services.
Brendan Lingam, Director of Sales and Business Development, Marc Ellis
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Enhancing Performance with Globus and the Science DMZGlobus
ESnet has led the way in helping national facilities—and many other institutions in the research community—configure Science DMZs and troubleshoot network issues to maximize data transfer performance. In this talk we will present a summary of approaches and tips for getting the most out of your network infrastructure using Globus Connect Server.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
DESIGN AND ANALYSIS OF A 32-BIT PIPELINED MIPS RISC PROCESSOR
1. International Journal of VLSI design & Communication Systems (VLSICS) Vol 10, No 5, October 2019
DOI : 10.5121/vlsic.2019.10501 1
DESIGN AND ANALYSIS OF A 32-BIT PIPELINED
MIPS RISC PROCESSOR
P. Indira1
, M. Kamaraju2
and Ved Vyas Dwivedi3
1,3
Department of Electronics and Communication Engineering, CU Shah University,
Wadhwan, Gujarat, India
2
Department of Electronics and Communication Engineering,Gudlavalleru Engineering
College, JNT University, Kakinada, Andhra Pradesh, India
ABSTRACT
Pipelining is a technique that exploits parallelism, among the instructions in a sequential instruction stream
to get increased throughput, and it lessens the total time to complete the work. . The major objective of this
architecture is to design a low power high performance structure which fulfils all the requirements of the
design. The critical factors like power, frequency, area, propagation delay are analysed using Spartan 3E
XC3E 1600e device with Xilinx tool. In this paper, the 32-bit MIPS RISC processor is used in 6-stage
pipelining to optimize the critical performance factors. The fundamental functional blocks of the processor
include Input/Output blocks, configurable logic blocks, Block RAM, and Digital clock Manager and each
block permits to connect to multiple sources for the routing. The Auxiliary units enhance the performance of
the processor. The comparative study elevates the designed model in terms of Area, Power and Frequency.
MATLAB2D/3D graphs represents the relationship among various parameters of this pipelining. In this
pipeline model, it consumes very less power (0.129 W),path delay (11.180 ns) and low LUT utilization (421).
Similarly, the proposed model achieves better frequency increase (285.583 Mhz.), which obtained better
results compared to other models.
KEYWORDS
MATLAB, SPARTAN3E, MIPS RISC processor, Xilinx, Digital Clock Manager.
1. INTRODUCTION
Currently, the VLSI digital systems design overburdened with many complex features.
Multitasking, parallelism makes the system slow and consumes more power to meet these customer
requirements; and designers have to compromise with the critical factors [1].
One best way to overcome this problem is an implementation of the pipelining technique in VLSI
system design[2]. Pipelining is an implementation technique that several operations of instruction
are performed simultaneously to optimize speed, area and throughput of the work [3]. By applying
the instruction pipelining, decrease of power, delay, time and enhancement of speed occurs along
with the complete utilization of hardware.
The objective of this pipeline process is to minimize the power, increase the speed and to get all
benefits of high performance. For that, each element used in this design is power optimized. In
2. International Journal of VLSI design & Communication Systems (VLSICS) Vol 10, No 5, October 2019
2
addition to that, low power and high speed techniques enhance the results. Instead of 32-bit
architecture, 64-bit and 128-bit complexities of architecture can be tried for challenge.
The problem definition in this proposed architecture is to design a low power high speed pipeline
model to achieve less power and latency with low power high performance.
RISC processors are efficient in various ways compared to CISC processors as they consume less
power, execute faster as the number of instructions is less andhas simplified addressing modes with
simpler designs etc. [4–9].
By using the MIPS RISC processor, besides millions of instructions are performed concurrently
without interlocking, qualitatively many advanced desirable functions are also can be executed.
The Spartan 3E FPGAs are the successors of Spartan 3 family with advanced features containing
high volume, cost-sensitive, more logic per I/O units and updated programming features without
hardware replacement, which is impossible with ASICs designs.
The applications include broadband access, home networking, display/projection, and digital
television equipment and simply it is a superior alternative to mask-programmed ASICs.
In this Research implementation, MIPS RISC Spartan 3E family processoris used to evaluate the
various parameters through the pipeline to get optimum throughput. The Verilog HDL coding is
used to implement the instruction pipelining process on Xilinx platform.
The organization of this paper is as follows: Brief discussion about the low power importance,
processor details, scope of the paper, problem definition and objectives are reflected in section 1-
Introduction. Section 2 reviewsthe related works of different authors. Section 3 describes the
proposed methodology which contains stages of pipelining and main elements of the processor.
Section 4 explains the low power high speed techniques to enhance the results. Section 5 is about
Hazards and their remedies. Section 6 presents the simulation results of all six stages of pipelining
and their analysis. Section 7 describes about the software tools used and the related parameters are
narrated through 3D and 2D graphs. The comparative analysis is carried out in Section 8, where
frequency, power, LUT and Process technologyare compared. Finally, conclusion and future scope
comprehend the process resultsand the additional room of further work, respectively.
2. PROPOSED METHODOLOGY:
In this proposed methodology, 6 stages of pipelining process have been carried out.They are
Instruction Fetch Stage, Instruction Decode Stage, Register Read Stage, Memory Access Stage,
Data Memory Stage, and Write Back Stage. The instructions are executed, deliberately and
systematically going through all the stages.Each stage performs its pre-determined tasks and
contributes to the whole task. Registers between each stage helps in buffering during the pipeline
process. The time allotted for each stage is given as one clock cycle. If at all any mismatches occur
hazards may take place. Thecontrol unit produces NOP signals (stalls) to avoid flushing from
pipelining. The expected hazards are taken care by using different techniques with hardware and
software protection.
For minimizing the power, each element in pipeline process is carefully selected and implemented.
In addition to that, low power technique,i.e., power gating (fine grain method) is applied to all the
3. International Journal of VLSI design & Communication Systems (VLSICS) Vol 10, No 5, October 2019
3
devices in pipeline to minimize the power consumption. Similarly balancing pipelining reduces the
latency and increases the speed of the process.
The processor used in this architecture is of Spartan 3E family, XC3 1600e device with package
FG484 Verilog HDL programming language is used for coding to implement the pipeline process.
Xilinx software tool is used for simulation to get the Dynamic power, leakage power and total
power for different frequencies. Load/ Store instructions are used for read/write instructions which
is associated with data memory.
Figure 1. A Pipeline Data path
2.1. Stages of Pipelining
To gain the speed up and further ease of operations, the pipeline has more stages. All the
instructions in the pipeline follow the same sequence of simultaneous operations.
Figure 2. Flow chart of Pipelining stages
2.1.1. Instruction fetch (IF) Stage:
In this stage, the Opcode of the instruction is retrieved from the Instruction Memory (or)
Instruction Cache. There is a buffer unit attached at this stage to collect thenext five instruction
Opcodes to enhance the performance of the pipelining.
4. International Journal of VLSI design & Communication Systems (VLSICS) Vol 10, No 5, October 2019
4
2.1.2. Instruction Decode (ID) Stage:
In this phase, once the Opcode is determined by decoding operation, operand specifierspush this
stage to the operand searching. Generally operands are available in Register Bank (or) data
Memory.
2.1.3. Calculate Operands (CO) Stage:
The effective address of each operand is estimated in this stage. Depending on the Addressing
modes operands are being located. For indirect addressing mode, to locate the operand, it takes 2
clock cycle time. For Direct &Immediate addressing modes only one cycle time is enough to
capture the data.
2.1.4. Fetch Operands (FO) Stage:
Generally all operands reside in big Data Memory.To save the time & for ease of operation, Data
Cache is used. Operands may be also fetched from registers.
2.1.5. Execute Instruction (EI) Stage:
In this stage, actual indicated operationsare performed with Arithmetic Logic Unit. Obviously this
Unit consumes more power and needs high processing speed.
2.1.6. Write back (WB) Stage:
The results are transferred/stored in Data Cache/Data Memory and update the Registers with new
information with flag status.
2.2. Fundamental programmable functional elements:
Figure 3. Block Diagram of Processor Core
5. International Journal of VLSI design & Communication Systems (VLSICS) Vol 10, No 5, October 2019
5
2.2.1. Main components
2.2.1.1. Configurable Logic Blocks (CLB) :
TheCLBs are meant for performing logic functions and for data storage.
2.2.1.2. Input and output Blocks (IOBs) :
It manages the data flow between input-output pins and internal devices. There are four IOBs Top
(B0), Right (B1), Bottom (B2), and Left (B3). Each I/O permits input flow/ output flow together
with a three state bus buffer. It consists of unidirectional and bidirectional interfaces with FPGA
internal logic. The unidirectional input is the only block that has the sub-set of the full
IOB capabilities. Thus there are no connections or logic for an output path.
2.2.1.3. Block RAM:
This unit stores the data in single port and dual port blocks. It also consists of dual port
RAM conflicts and resolution block which resolves the data hazards and structural hazards.
2.2.1.4. Multiplier Blocks:
The multiplication can be performed here with 18 bit binary numbers.
2.2.1.5. Digital Clock Manager (DCM):
It is a “self-calibrating”system, which can perform functions such as delaying, multiplying,
dividing and phase shifting.
2.2.2. Auxiliary components
2.2.2.1. Package Marking:
The “quad flat packages” are used in this core processor.
2.2.2.2. Input Delay Option:
The “programmable delay block” delays the signal whenever it requires. This adjusts path delay
when input flip-flops are used with a global clock.
The delay values are assigned as follows:
IBUF_DELAY_VALUE
2.2.2.3. Storage Element Functions:
There are 3 pairs of storage elements (edge triggered D-flip-flops or a level sensitive latch) work
together with a special multiplexer to produce double data rate transmission.
There is a register cascaded feature which intends to simplify the operation and to enhance the
speed.
6. International Journal of VLSI design & Communication Systems (VLSICS) Vol 10, No 5, October 2019
6
2.2.2.4. Keeper Circuit:
It holds last logic level even though all drivers have been turned off. Pull-up Pull-down resistors
overwrite the keeper settings.
2.2.2.5. ESD Protection :
Electro-static Discharge (ESD) effect and voltage fluctuation damages can be eliminated by clamp
diodes to protect all the device pads in the system.
2.2.2.6. JTAG Boundary scan capabilities:
It allows the debug / emulation functions regardless of mode pin settings. Selecting JTAG mode
cancels the other modes to perform its operation.
The JTAG interface is easily cascaded to any number of FPGAs by connecting TDO output of one
device to the other TDI input of the next device in the chain.
2.2.2.7. Program boundary to third party support:
The main system boundary is extended to third party utilization for programs, data, etc. which are
connected through socket adopter.
2.2.2.8. Power Distribution System (PDS):
In this, PDS is designed nicely by bypass/ decoupling capacitors. The power on reset (POR) circuit
in PDS holds the reset state until the VccINT,VCCAUX,and VCCO Bank2 reach their respective threshold
levels.
2.2.2.9. No internal charge pumps:
It is a system protection feature. This feature allows FPGA to reject the analog noise when the
CCLK configuration clock is ON.
2.2.2.10. Production Stepping:
allows advanced features of stepping 1 amalgamated to the previous version of stepping 0 into it.
2.2.2.11. Simultaneous Switching Outputs:
This feature allows the maximum number of concurrently connected outputs to operate without
getting into the danger of switching noise.
2.3. Main components of pipeline model:
2.3.1. Low Power ALU [10]:
7. International Journal of VLSI design & Communication Systems (VLSICS) Vol 10, No 5, October 2019
7
The ALU performs all complex tasks, which need more power. A low leakage ALU is designed so that
wastage of power is controlled and overall power is reduced. For that, low leakage ALU model is
implemented in this pipeline.
2.3.2. Caches:
Instead of using data memory and instruction memory, data cache and instruction cache are used in pipeline
process. These small memories are effective in locating the operands and thereby it saves time. Most
frequently used & important data is readily utilized by caches in pipeline process than big memories.
2.3.3. Data forwarding Unit and Branch and Jump Prediction Unit [11]:
Data Hazard can be resolved by using Data forwarding unitwhich sends the result in advance. Usage of
forwarding unit is an efficient technique that predicts the data required and bypasses some of the stages. A 2
bit branch and jump prediction unit is used to store the destination address of the branch instruction and the
current branch status.
2.3.4. DDR4 SDRAM Controller [12]:
DDR4 SDRAM controller is a recent version after DDR3 SDRAM, which is widely used in PC, high-end
servers, smartphones, etc. It is basically an interface which bridges the gap between SDRAM devices and
Processor sub-system. The main advantage of DDR4 family is reduced power, parallel bank group, faster
burst access and better enablement for large capacity memory sub-systems. It has more advanced features:
CRC generator, Calibration Unit, Command generation Unit, etc.
2.3.5. Pipeline Registers [13]:
The pipeline registers consist of dual edge triggered implicit Flip-flops (DIFF_CGS). The main advantage of
using these types of Flip-flops in pipeline registers is to get the benefit of low power. Almost 10% of the
power is getting reduced by using these Flip-flops in registers.
3. LOW POWER AND HIGH SPEED TECHNIQUES:
3.1. Power Gating [14]:
Power gating is a well-known Technique used in this work. Whenever the system is in the inactive
state, the circuit gets turned-off automatically, by using this technique. This saves the leakage
power in standby mode. The low power units used in this work implicitly reduces the power to
provide their contribution in power minimization. Asynchronous blocks (especially ALU) are
connected to synchronous blocks with handshake signals and are used in this design to maintain
high speeds and utilize time optimally.
3.2. Deeper Pipeline process:
Pipeline technique is used for power saving, effective utilization of hardware&time and to get
maximum speed. Actually pipelining doesn’t reduce power by itself, it reduces the critical path
delay by inserting registers between the combinational logic. At the same time, speed will be
increased amd hence the total process time gets reduced.
For Deeper pipeline process, speed can be enhanced more effectively than normal pipelining. For a
6-stage pipeline process, suppose each stage is given 10 units of time. Each stage (task) is
8. International Journal of VLSI design & Communication Systems (VLSICS) Vol 10, No 5, October 2019
8
independent and they can be finished in their own time. For example, Instruction Decode stage may
take only two units to complete the task while the Execute stage may need 9 units to complete the
task. Giving equal time to each stage is not appropriate to obtain maximum speed. And hence,
deeper pipelining can be introduced to increase the speed. For that, two units of slots are allocated
to the entire pipeline. In that, IF stage uses just two slots (4 units) to complete the task, where as
Execute stage uses all 5 slots (9 units) to accomplish the task, which is shown in the Figure 4.
Figure 4. Deeper Pipelining
Before Deeper pipelining, total time taken to complete the taskTcyc = TIF + TID + TOF + TES +
TOSTcyc = 4 + 1 + 8 + 9 + 6 = 28sec.After Deeperpipeline:
Speed up = 28/2 = 14 sec.
By using Deeper pipeline method, the speed of the pipeline process can be enhanced by 14 times.
4. HAZARDS
Pipeline Hazards occur, where one instruction cannot immediately follow another in a concurrent instructions
execution. Hazards can always be resolved by waiting. Hazards limit the performance of the computers.
4.1. Structural Hazards:
This Hazard occurs, when two (or) more instructions need to utilize the same resource at atime.
Example:
1. One memory unit is used for instruction fetch and datafetch.
2. Since many floating point instructions require many cycles it is easy for them to interfere with
each other.
Dealing with structural hazards:
1. Releasing of Stalls:This is a low cost, simple, but increases clock cycles per instruction. Using
stalls should be avoided because stalling is a performance effect.
9. International Journal of VLSI design & Communication Systems (VLSICS) Vol 10, No 5, October 2019
9
2. Separate Hardware Resources must be allocated, as they are useful for multi-cycle instructions
with for good performance,however sometimes it is complex too.
3. Replicate the resources for good performance, however it increases the cost and may introduce
inter-connected delay.
4.2. Data Hazards:
Data hazards occur when data is used before it is ready. As shown in the figure below, the ADD
instruction releases its result after the execute stage. But that is needed by the SUB instruction
even before it is released. The solutions for a data hazards are stalling, forwarding, and reordering.
4.2.1. The Hazard Unit Releases the Stalls to Avoid the Flushing.
Figure5.Introducing Stalls
4.2.2. The Key Idea to Resolve the Data Hazard Through Forwarding the Data Directly to
Next Stage as Shown in the Figure
Figure6. Data Forward
Reordering:
It has been addressed only potential data hazards, where the forwarding unit is able to detect and
resolve them without affecting the performance of the pipeline. There are also unavoidable data
hazards which the forwarding unit cannot resolve. Either stalls or reordering of the instructions can
be done by the compiler. The compiler gives the preference to independent instructions to
introduce the delay.
10. International Journal of VLSI design & Communication Systems (VLSICS) Vol 10, No 5, October 2019
10
4.3. Control Hazards:
If the main program is branching towards a sub-program, it needs the return address in main
memory. Then only it jumps towards subroutine. The control hazard occurs if the instruction is
not specified in the return address.
Remedies:
- Stop loading instructions until the result is available.
- Assume an outcome and continue fetching but one may lose cycles if it is a mis-prediction.
-
Figure7. Control Hazard – Incorrect prediction
Figure8. Control Hazard – Correct prediction
For each branch encountered during execution, branch predictor predicts whether the branch will
be taken or not.
11. International Journal of VLSI design & Communication Systems (VLSICS) Vol 10, No 5, October 2019
11
Delayed branches:
The compiler coding is arranged in such a way that preference is given to independent instructions
prior to each branching so as to introduce delay.
5. RESULTS AND ANALYSIS:
5.1. Instruction Fetch(IF) Stage:
Figure9. RTL schematics of Fetch Stage
This is the first stage in the pipelining stages, where instruction memory is used to retrieve the
opcode of an instruction. The PC is then incremented to the next address fetch by PC +4.
Table 1. Power consumption of Fetch Stage
Frequency
(MHz)
Leakage
Power (W)
Dynamic
Power (W)
Total
Power
(W)
250 0.0203 0.0002 0.0205
500 0.0204 0,0005 0.0209
750 0.0205 0,0007 0.0212
1000 0.0206 0.0010 0.0216
5.2. Instruction Decode (ID) Stage:
Figure10. RTL Schematics of Decode Stage
In this stage a decode operation is performed by a decoder unit and it is generally attatched with the
register banks.
12. International Journal of VLSI design & Communication Systems (VLSICS) Vol 10, No 5, October 2019
12
Table 2. Power consumption of Decode Stage
Frequency
(MHz)
Leakage
Power (W)
Dynamic
Power (W)
Total
Power W)
250 0.0203 0.0001 0.0204
500 0.0203 0.0002 0.0205
750 0.0203 0.0003 0.0206
1000 0.0203 0.0003 0.0206
5.3. Register Read (RR) Stage:
Figure11. RTL schematics of Register Read Stage
In this stage, effective address of the operand is calculated using different addressing modes and
locating the required data in register bank or data memory.
Table 3. Power consumption of Register Read Stage
Frequency
(MHz)
Leakage
Power (W)
Dynamic
Power (W)
Total
Power
(W)
250 0.0204 0.0024 0.0228
500 0.0204 0.0048 0.0253
750 0.0205 0.0073 0.0278
1000 0.0206 0.0097 0.0302
5.4. Execute(EXE) Stage:
Figure 12. RTL schematics of Execute Stage
In this stage all types of instructions are executed. Depending on the addressing mode this stage
may take one or two clock cycles.
13. International Journal of VLSI design & Communication Systems (VLSICS) Vol 10, No 5, October 2019
13
Table 4. Power consumption of Execute Stage
Frequency
(MHz)
Leakage Power
(W)
Dynamic Power (W) Total
Power W)
250 0.0211 0.0025 0.0236
500 0.0217 0.0028 0.0245
750 0.0221 0.0031 0.0252
1000 0.0228 0.0038 0.0266
5.5. Memory –Access Stage:
In this stage load / store instructions are used especially to retrieve the data or to supply the data to
the data memory.
Either results or stored data is exchanged between ALU and data memory.
Figure 13. RTL Schematics of Memory Access stage
Table 5. Power consumption of Memory Access Stage
Frequency
(MHz)
Leakage
Power (W)
Dynamic
Power (W)
Total
Power (W)
250 0.0209 0.0006 0.0215
500 0.0203 0.0012 0.0216
750 0.0204 0.0018 0.0222
1000 0.0204 0.0025 0.0228
5.6. ‘Write_Back (WB) Stage:
Figure 14. Outputs of Write-back
The results obtained in the execution stage are necessarily stored in the register banks in this stage.
14. International Journal of VLSI design & Communication Systems (VLSICS) Vol 10, No 5, October 2019
14
Table 6. Power consumption of Write Back Stage
Frequency
(MHz)
Leakage Power
(W)
Dynamic Power
(W)
Total
Power (W)
250 0.0203 0.0004 0.0207
500 0.0203 0.0007 0.0211
750 0.0203 0.0011 0.0215
1000 0.0203 0.0015 0.0218
6. SOFTWARE TOOL AND RELATED REPRESENTATION
Spartan 3E applications can be processed using the Xilinx ISE 8.1i software, which also
implements critical bit stream generator updates.
Verilog HDL coding is usedto obtain the data in all the stages. MATLAB Tool is also supported in
connected to the parameter relations.
Figure 15. Time and delay summary report
Figure 15 shows the clock arrival time and finished time with maximum operating frequency and
path delay of the pipelining.
Figure 16. Frequency vs. Leakage Power & Dynamic Power Figure 17. Frequency vs Power
Spartan 3E family FPGA device XC3E1600E is used in this research to obtain the
informationabout the parameters in six stages of pipelining with package of FG484. The above
curve (Figure 16) represents the relationship with Dynamic power, Leakage power and Frequency
and Figure 17 represents the relationship between Frequency and Power.
15. International Journal of VLSI design & Communication Systems (VLSICS) Vol 10, No 5, October 2019
15
7. COMPARATIVE ANALYSIS
Table 7.Power comparison of various pipeline models
Parameters [15] Process
Technology
LUTs Frequency
(MHz)
Power
(W)
Spartan3: XC3S1500L-4FG676 90nm 417 98.090 0.144
Virtex5: XC5VFX30T –3FF665 65 nm 300 321.048 0.777
Virtex6: XC6VLX75T – 3FF784 40 nm 307 401.881 1.440
Virtex6Low Power:
XC6VLX75TL–IL-FF784
40 nm 307 335.233 0.920
Proposed Model 90nm 421 285.583 0.129
Figure18. Performance Comparison of various process cores
In device comparison graph, comparison is done with the different devices with different
parameters such as Process Technology, LUTs, Frequency and Power. The proposed model,
(Spartan3E) consumes less power 0.129 W when compared to other device powers.
Table 8. Frequency and LUT comparison of various pipeline models
Parameters Proposed
Model
GPPM[1
6]
Low Power
MIPS [17]
MIPS
Core [18]
Tiny
CPU [19]
Max.
Frequency
(MHz)
285.583 277.9 205.7 95.5 89
LUT 421 1168 1890 2340 336
In Frequency comparative analysis, proposed model has obtained the maximum operating
frequency (285.583MHz) with the LUTs utilized is 421.When compared to the other
models,proposed model gains better results in terms of increased speed and less utilization of
slices.
0
200
400
600
800
1000
1200
1400
1600
Spartan3 Virtex5 Virtex6 Virtex6LP Our Model
Process Technology (nm) LUTs Frequency (MHz) Power (mW)
16. International Journal of VLSI design & Communication Systems (VLSICS) Vol 10, No 5, October 2019
16
Figure19. Frequency Comparison
8. CONCLUSION
In this research, Spartan 3E family device of MIPS RISC Processor is employed to implement the
6-stage pipelining. Low power high speed techniques were employed to reduce the power and to
increase the speed. The proposed model utilizes a total power of 0.129W,which is less than its
counterparts. Similarly, the speed of this process is 285.583 MHzwhich is 22% higher than GPPM
model, which has the same Spartan 3E device. Moreover, the utilization of number of elements is
also less, there by optimizing the area with this design.
Various hazards are analyzed, for which possible remedies are also presented. The dynamic power,
leakage power and total power are measured at various frequencies. Various parameters relating to
this pipeline process are shown through 3D and 2D graphs by using MATLAB tool. The time
taken to complete the whole pipeline processwith propagation delay is also noted, which is in the
order of Nano seconds. Verilog HDL coding with Xilinx software tool is used to obtain the
simulation results.
In comparison to various pipelining models,the proposedmodel obtains the best results in terms of
power, speed, and area utilization.
9. FUTURE SCOPE
The proposed pipeline model uses Spartan 3E Processor with normal pipeline process. Whereas,
Virtex 7 processors can use super scalar pipeline which is faster than normal pipelining and super
pipelining. In this work, power gating technique is used to minimize the power. Some other power
minimization techniques can be combined with this, to further reduce the power and to increase the
speed. Spartan 7 processor has more advanced features than Spartan 3E and can be used with more
number of pipeline stages, which eases the process.
0
50
100
150
200
250
300
Our Model GPPM Low Power
MIPS
MIPS Core Tiny CPU
Frequency,
MHz
17. International Journal of VLSI design & Communication Systems (VLSICS) Vol 10, No 5, October 2019
17
REFERENCES
[1] Rashid F. Olanrewaju, Fawwaj E Fajingbesi, S.B. Junaid, Ridzwan Alahudin, Farhat Anwar & Bisma
Rasool Pampori (2017) “Design and Implementation of a Five Stage Pipelining Architecture Simulator
for RiSC-16 Instruction Set”, Indian Journal of Science and Technology, Vol. 10, No. 3, pp 1-9.
[2] Vijaykumar J, Nagaraju B, Swapna C & Ramanujappa T (2014 April) “Design and Development of
FGPA based Low Power pipelined 64-bit RISC processor with Double precession Floating point Unit”,
International Conference on Communication and Signal processing.
[3] Saranya Krishnamurthy, Ramani Kannan, Erman Azwan Yahya & Kishore Bingi (2017) “ Design of
FIR Filter using Novel pipelined Bypass Multiplier”,IEEE 3rd International Symposium on Robotics
and Manufacturing Automation, pp1-6.
[4] Sneha Mangalwedhe, Roopa Kulkarni & S. Y. Kulkarni (2017) “Low Power Implementation of 32-bit
RISC Processor with pipelining. 2nd International Conference on Microelectronics”, Computing &
Communication Systems (MCCS-2017), at Bangalore.
[5] Husainali S Bhimani, Hitesh N. Patel & Abhishek A Davda (2016) “Design of 32-bit 3-stage pipelined
processor based on MIPS in Verilog HDL and implementation on FPGA Virtex7”,International Journal
of Applied Information Systems, Vol. 10, No. 9.
[6] Rakesh M.R. (2014 April) “RISC Processor Design in VLSI Technology Using the Pipeline
Technique”, International journal of innovative research in Electrical, Electronics, Instrumentation and
control Engineering, Vol. 2, No. 4.
[7] Indu M& Arun Kumar M. (2013 August) “Design of Low Power Pipelined RISC
Processor”,International Journal of Advanced Research in Electrical Electronics and Instrumentation
Engineering, Vol. 2, No. 8.
[8] Priyanka Trivedi &Rajan Prasad Tripathi (2015) “Design & Analysis of 16 bit RISC Processor Using
Low Power Pipelining”,International Conference on Computing, Communication and Automation, pp
1294-1297.
[9] Charu Sharma & Gurupreet Singh Saini (2017 June) “Design and Analysis of High Performance RISC
Processor using Hyperpipelining Technique”,IJASRE, Vol. 3, No. 5, pp 200-206.
[10] Meera S & Umamaheshwari D “Genetic Algorithm for Leakage Reduction through IVC using Verilog”
International Journal of Microelectronics Engineering, Vol. 1, No. 1, pp 51-62.
[11] Zulkifli.M, Yudhanto.Y.P , Soetharyo N.A, and Adiono.T (2009, August), “Reduced Stall MIPS
Architecture using Pre-Fetching Accelerator”,International Conference onElectrical Engineering and
Informatics, IEEE.
[12] Md. Ashraful Islam, Md. Yeasin Arafath, Md. Jahid Hasan (2014, December) “Design of DDR4
SDRAM Controller”,8th
International Conference on Electrical and Computer Engineering, Dhaka,
Bangladesh.
[13] Liang Geng, Ji-zhong Shen, Cong-yuan Xu (2016) “Power-efficient dual-edge implicit pulse-triggered
flip-flop with an embedded clock-gating scheme”, Frontiers of Information Technology and Electrical
Engineering,Vol. 17, No. 9. PP 962-972.
18. International Journal of VLSI design & Communication Systems (VLSICS) Vol 10, No 5, October 2019
18
[14] Aruljothi K, Prajitha PB & Rajaprabha R (2014) “Leakage Power reduction using Power gating and
Multi-vt technique”, International Journal of Advanced research in Computer Engineering and
Technology, Vol. 3, No. 1.
[15] Narender Kumar & Munish Rattan. (2015 December) “Implementation of Embedded RISC processor
with Dynamic Power Management for Low-Power Embedded system on SOC”, IEEE Proceedings of
2015 RAECS.
[16] Nishant Kumar & Ekta Aggrawal (2013, September), “General Purpose Six-Stage Pipelined
Processor”,International Journal of Scientific & Engineering Research, Vol. 4, No.9.
[17] Mamum Bin IbneReaz, Shabiul Islam & Mohd. S. Sulaiman (2002 December) “A single Clock Cycle
MIPS RISC Processor Design using VHDL”,In proceedings of IEEE International Conference on
Semiconductor Electronics, Penang, Malaysia, pp 199-203.
[18] Gautham P, Parthasarathy R. & KarthiBalasubramanian (2009 December) “Low Power Pipelined MIPS
Processor Design”, In proceedings of IEEE International Conference on Integrated circuits, pp 462-
465.
[19] Koji Nakano, Kensuke Kawakami, Koji Shigemoto, Yuki Kamada & Yasuaki Ito (2008) “A Tiny
Processing System for Education and Small embedded Systems on the FPGAs”, In IEEE International
Conference on Embedded and Ubiquitous Computing.
AUTHORS
PONUGUMATLA INDIRA,
M. Tech., MBA, M.Sc. (Psych), (Ph.D.).
Assistant Professor, GITAM University, Hyderabad
DR. M. KAMARAJU,
M. Tech., Ph.D.
Professor, ECE Dept.,
Gudlavalleru Engineering College, JNTUK,
Krishna District, Andhra Pradesh, India
DR. VED VYAS DWIVEDI,
M. Tech., Ph.D.
Professor, ECE Dept.
Pro-Vice Chancellor, CU Shah University,
Wadhwan – Gujarat, India