This document proposes a NoC-based infrastructure to enable dynamic self-reconfigurable systems (DSRSs). The key components of a DSRS include reconfigurable interfaces, repositories to store configurations, configuration ports to access the reconfigurable fabric, and a configuration controller. A network-on-chip (NoC) is proposed as the communication infrastructure to actively support partial and dynamic reconfiguration of IPs. The document outlines design choices for each component and presents two case studies comparing area overhead and reconfiguration time.
The document discusses the Chameleon Chip, a reconfigurable processor that can rewire itself dynamically to adapt to different software tasks. It contains reconfigurable processing fabric divided into slices that can be reconfigured independently. Algorithms are loaded sequentially onto the fabric for high performance. The chip architecture includes an ARC processor, memory controller, PCI controller, and programmable I/O. Its applications include wireless base stations, wireless local loops, and software-defined radio.
Coarse grained hybrid reconfigurable architecture with no c routerDhiraj Chaudhary
The document describes a coarse-grained reconfigurable architecture with a Network-on-Chip router that is designed to perform variable block size motion estimation for video compression. The architecture uses intelligent NoC routers to direct reference block data between processing elements, reducing data interactions with external memory and decreasing execution time. The paper also proposes two enhancements that reduce the architecture's area by 4.8% and router power consumption by 42%.
This document describes a coarse-grained reconfigurable architecture with a Network-on-Chip (NoC) router designed for variable block size motion estimation. The architecture contains 16 processing elements arranged in a 2D array that can calculate Sum of Absolute Differences (SAD) for different block sizes. An NoC with intelligent routers is used to direct reference block data between processing elements to reduce memory interactions and improve execution time. The architecture supports fast search algorithms like diamond search that further improve efficiency over full search.
Coarse Grained Hybrid Reconfigurable Architecture with NoC Router for Variabl...Dhiraj Chaudhary
This document describes a coarse-grained reconfigurable architecture with a Network-on-Chip (NoC) router designed for variable block size motion estimation. The architecture contains 16 processing elements arranged in a 2D array that can calculate Sum of Absolute Differences (SAD) for different block sizes. An NoC with intelligent routers is used to direct reference block data between processing elements to reduce memory interactions and increase computation efficiency. The architecture supports fast search algorithms like diamond search that further improve performance over full search.
Coarse grained hybrid reconfigurable architecture with noc router for variabl...Dhiraj Chaudhary
The document describes a coarse-grained reconfigurable architecture with a Network-on-Chip router that is designed to perform variable block size motion estimation for video compression. The architecture uses intelligent NoC routers to direct reference block data between processing elements, reducing data interactions with external memory and decreasing execution time. The paper also proposes two enhancements that reduce the architecture's area by 4.8% and router power consumption by 42%.
Below is a link to a paper to be presented at the The International Conference on Engineering of Reconfigurable Systems and Algorithms (ERSA\'11) that describes SRC Computers\' efficient toolflow for FPGA development, which has demonstrated success in cost-effective migration of applications to hardware.
The document discusses the Chameleon Chip, a reconfigurable processor that can rewire itself dynamically to adapt to different software tasks. It contains reconfigurable processing fabric divided into slices that can be reconfigured independently. Algorithms are loaded sequentially onto the fabric for high performance. The chip architecture includes an ARC processor, memory controller, PCI controller, and programmable I/O. Its applications include wireless base stations, wireless local loops, and software-defined radio.
Coarse grained hybrid reconfigurable architecture with no c routerDhiraj Chaudhary
The document describes a coarse-grained reconfigurable architecture with a Network-on-Chip router that is designed to perform variable block size motion estimation for video compression. The architecture uses intelligent NoC routers to direct reference block data between processing elements, reducing data interactions with external memory and decreasing execution time. The paper also proposes two enhancements that reduce the architecture's area by 4.8% and router power consumption by 42%.
This document describes a coarse-grained reconfigurable architecture with a Network-on-Chip (NoC) router designed for variable block size motion estimation. The architecture contains 16 processing elements arranged in a 2D array that can calculate Sum of Absolute Differences (SAD) for different block sizes. An NoC with intelligent routers is used to direct reference block data between processing elements to reduce memory interactions and improve execution time. The architecture supports fast search algorithms like diamond search that further improve efficiency over full search.
Coarse Grained Hybrid Reconfigurable Architecture with NoC Router for Variabl...Dhiraj Chaudhary
This document describes a coarse-grained reconfigurable architecture with a Network-on-Chip (NoC) router designed for variable block size motion estimation. The architecture contains 16 processing elements arranged in a 2D array that can calculate Sum of Absolute Differences (SAD) for different block sizes. An NoC with intelligent routers is used to direct reference block data between processing elements to reduce memory interactions and increase computation efficiency. The architecture supports fast search algorithms like diamond search that further improve performance over full search.
Coarse grained hybrid reconfigurable architecture with noc router for variabl...Dhiraj Chaudhary
The document describes a coarse-grained reconfigurable architecture with a Network-on-Chip router that is designed to perform variable block size motion estimation for video compression. The architecture uses intelligent NoC routers to direct reference block data between processing elements, reducing data interactions with external memory and decreasing execution time. The paper also proposes two enhancements that reduce the architecture's area by 4.8% and router power consumption by 42%.
Below is a link to a paper to be presented at the The International Conference on Engineering of Reconfigurable Systems and Algorithms (ERSA\'11) that describes SRC Computers\' efficient toolflow for FPGA development, which has demonstrated success in cost-effective migration of applications to hardware.
The design of high performance Digital Signal Processing (DSP) Processors for Software Defined Radio (SDR) with high degree of flexibility and low power consumption has been a major challenge to the
scientific community ever since its conception. The basic philosophy of SDR is to implement different modulation or demodulation schemes on the same underlying hardware. Currently available high performance DSP processors, optimized with ‘Very Large Instruction Word (VLIW)’ architecture and multiply and accumulate (MAC) units, are unable to meet the near real time speed requirements of
Software Defined Radios (SDR) due to their inherent sequential execution of compute intensive signal
processing algorithms. Moreover, their power dissipation is considerably high. Even though, Application Specific Integrated Circuits (ASIC) exhibit high performance, they are also not suitable because of their lack of flexibility. Various references on FPGA based implementations of reconfigurable architectures for SDRs are also available. However, the Look-up Table (LUT) based implementations of FPGAs are not
optimum and therefore, cannot offer highest performance at low silicon cost. Keeping this view, this paper presents the design of a configurable communication processor for Software Defined Radio. The proposed
scheme features the performance of an ASIC based design combined with the flexibility of software. Experimental results reveal that the proposed architecture has minimum hardware requirement, improved silicon area utilization and low power dissipation.
The document provides an introduction to systems approaches and system architecture. It discusses how system architecture has evolved over time to deal with increasing complexity as transistor density has grown exponentially. A system-on-chip architecture combines various processors, memories, and interconnects tailored for a specific application domain. The document then discusses the key components of systems, including different types of processors, memories, and interconnects. It also covers the tradeoffs between hardware and software implementations and different processor architectures used in systems-on-chip.
An octa core processor with shared memory and message-passingeSAT Journals
Abstract This being the era of fast, high performance computing, there is the need of having efficient optimizations in the processor architecture and at the same time in memory hierarchy too. Each and every day, the advancement of applications in communication and multimedia systems are compelling to increase number of cores in the main processor viz., dual-core, quad-core, octa-core and so on. But, for enhancing the overall performance of multi processor chip, there are stringent requirements to improve inter-core synchronization. Thus, a MPSoC with 8-cores supporting both message-passing and shared-memory inter-core communication mechanisms is implemented on Virtex 5 LX110T FPGA. Each core is based on MIPS III (Microprocessor without interlocked pipelined stages) ISA, handling only integer type instructions and having six-stage pipeline with data hazard detection unit and forwarding logic. The eight processing cores and one central shared memory core are inter connected using 3x3 2-D mesh topology based Network-on-chip (NoC) with virtual channel router. The router is four stage pipelined supporting DOR X-Y routing algorithm and with round robin arbitration technique. For verification and functionality test of above fully synthesized multi core processor, matrix multiplication operation is mapped onto the above said. Partitioning and scheduling of multiple multiplications and addition for each element of resultant matrix has been done accordingly among eight cores to get maximum throughput. All the codes for processor design are written in Verilog HDL. Keywords: MPSoC, message-passing, shared memory, MIPS, ISA, wormhole router, network-on-chip, SIMD, data level parallelism, 2-D Mesh, virtual channel
DhkGive Me the proper retort for Use full
It can be very essential for all students
Its honorable request for given Give Me the proper retort for Use full
It can be very essential for all students
Its honorable request for given retGive Me the proper retort for Use full
It can be very essential for all students
Its honorable request for given Give Me the proper retort for Use full
It can be very essential for all students
Its honorable request for given Give Me the proper retort for Use full
It can be very essential for all students
Its honorable request for given Give Me the proper retort for Use full
It can be very essential for all students
Its honorable request for given Give Me the proper retort for Use full
It can be very essential for all students
Give Me the proper retort for Use full
It can be very essential for all students
Its honorable request for given Give Me the proper retort for Use full
It can be very essential for all students
Give Me the proper retort for Use full
It can be very essential for all students
Its honorable request for given Give Me the proper retort for Use full
It can be very essential for all students
Give Me the proper retort for Use full
It can be very essential for all students
Its honorable request for given Give Me the proper retort for Use full
It can be very essential for all students
Its honorable request for given Give Me the proper retort for Use full
It can be very essential for all students
Its honorable request for given retGive Me the proper retort for Use full
It can be very essential for all students
Its honorable request for given Give Me the proper retort for Use full
It can be very essential for all students
Its honorable request for given Give Me the proper retort for Use full
It can be very essential for all students
Its honorable request for given Give Me the proper retort for Use full
It can be very essential for all students
Its honorable request for given Give Me the proper retort for Use full
It can be very essential for all students
Give Me the proper retort for Use full
It can be very essential for all students
Its honorable request for given Give Me the proper retort for Use full
It can be very essential for all students
Give Me the proper retort for Use full
It can be very essential for all students
Its honorable request for given Give Me the proper retort for Use full
It can be very essential for all students
Give Me the proper retort for Use full
It can be very essential for all students
Its honorable request for given Give Me the proper retort for Use full
It can be very essential for all students
Its honorable request for given Give Me the proper retort for Use full
It can be very essential for all students
Its honorable request for given retGive Me the proper retort for Use full
It can be very essential for all students
Its honorable request for given Give Me the proper retort for Use full
It can be very essential for all st
Synergistic processing in cell's multicore architectureMichael Gschwind
The document discusses the Cell Broadband Engine architecture, which was designed to improve performance over desktop systems by an order of magnitude. It has a heterogeneous multi-chip design with one Power processor element for control tasks and eight synergistic processor units for data processing. The SPU architecture implements a novel pervasively data-parallel approach that combines scalar and SIMD processing on wide data paths to improve efficiency. This enables more processing cores to fit on a chip for high thread-level parallelism.
Designing of telecommand system using system on chip soc for spacecraft contr...IAEME Publication
The emerging developments in semiconductor technology have made possible to design
entire system onto a single chip, commonly known as System-On-Chip (SoC). The increase in Space
System‘s capabilities by the On-board data processing capabilities can be overcome by optimizing
the SoCs to provide cost effective, high performance, and reliable data. This is achieved by
embedding pre-designed functions into a single SoC, which utilizes specialized reusable core (IP
cores) architecture into complex chip. This paper is concerned with the design of Telecommand
system for transfer of signals from ground station to space station by the integration of SRAM (Static
Random Access Memory), ARM (Advanced RISC Machine) Processor, EDAC unit (Error Detection
and Correction) and CCSDS (Consultative Committee for Space Data System) decoder system. In
this paper we designed the Telecommand SoC by using Verilog code. The implementations have
been done using XILINX FPGA platform and the functionality of the system is verified using
Modelsim simulation. The results are analyzed for SPARTAN 3E device and ARM board and two
devices are being controlled by the signal transfer.
Designing of telecommand system using system on chip soc for spacecraft contr...IAEME Publication
This document describes the design of a telecommand system using a System on Chip (SoC) for spacecraft control applications. It involves integrating various components onto a single chip, including SRAM, an ARM processor, and an Error Detection and Correction (EDAC) unit. The telecommand data is received and stored in the SRAM. The EDAC unit uses a Hamming code to detect and correct any errors in the data before it is processed by the ARM processor. The processor then collects onboard data signals and produces the output result. Verilog code is used to design the SoC, which is implemented and tested on a Xilinx FPGA platform and ARM board. The SoC allows two devices to be controlled by
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
HOMOGENEOUS MULTISTAGE ARCHITECTURE FOR REAL-TIME IMAGE PROCESSINGcscpconf
The document describes a homogeneous multistage architecture for real-time image processing. It proposes a parallel architecture using multiple identical processing elements connected by different communication links. As an example application, it discusses a multi-hypothesis approach for road recognition, which uses multiple hypotheses to detect and track road edges in video in real-time. Experimental results using a FPGA demonstrate the architecture can detect roadsides in images within 60 milliseconds.
DYNAMIC HW PRIORITY QUEUE BASED SCHEDULERS FOR EMBEDDED SYSTEMijesajournal
A real-time operating system (RTOs) is often used in embedded system, to structure the application code
and to ensure that the deadlines are met by reacting on events by executing the functions within precise
time. Most embedded systems are bound to real-time constraints with determinism and latency as a critical
metrics. RTOs are generally implemented in software, increases computational overheads, jitter and
memory footprint. Modern FPGA technology, enables the implementation of a full featured and flexible
hardware based RTOs, which helps in reducing to greater extent these overheads even if not remove
completely. Scheduling algorithms play an important role in the design of real-time systems. An Adaptive
Fuzzy Inference System (FIS) based scheduler framework proposed in this article is based on the study and
conclusion drawn from the research over the years in HW SW co-design domain. The proposed novel two
phase FIS based adaptive hardware task scheduler minimizes the processor time for scheduling activity
which uses fuzzy logic to model the uncertainty at first stage along with adaptive framework that uses
feedback which allows processors share of task running on multiprocessor to be controlled dynamically at
runtime. This Fuzzy logic based adaptive hardware scheduler breakthroughs the limit of the number of
total task and thus improves efficiency of the entire real-time system. The increased computation overheads
resulted from proposed two phase FIS scheduler can be compensated by utilising the basic characteristics
of parallelism of the hardware as scheduler being migrated to FPGA.
Dynamic HW Priority Queue Based Schedulers for Embedded System[ijesajournal
A real-time operating system (RTOs) is often used in embedded system, to structure the application code
and to ensure that the deadlines are met by reacting on events by executing the functions within precise
time. Most embedded systems are bound to real-time constraints with determinism and latency as a critical
metrics. RTOs are generally implemented in software, increases computational overheads, jitter and
memory footprint. Modern FPGA technology, enables the implementation of a full featured and flexible
hardware based RTOs, which helps in reducing to greater extent these overheads even if not remove
completely. Scheduling algorithms play an important role in the design of real-time systems. An Adaptive
Fuzzy Inference System (FIS) based scheduler framework proposed in this article is based on the study and
conclusion drawn from the research over the years in HW SW co-design domain. The proposed novel two
phase FIS based adaptive hardware task scheduler minimizes the processor time for scheduling activity
which uses fuzzy logic to model the uncertainty at first stage along with adaptive framework that uses
feedback which allows processors share of task running on multiprocessor to be controlled dynamically at
runtime. This Fuzzy logic based adaptive hardware scheduler breakthroughs the limit of the number of
total task and thus improves efficiency of the entire real-time system. The increased computation overheads
resulted from proposed two phase FIS scheduler can be compensated by utilising the basic characteristics
of parallelism of the hardware as scheduler being migrated to FPGA.
The document discusses Brocade's network solutions for intelligence, surveillance, and reconnaissance (ISR) systems. Brocade provides solutions throughout the ISR architecture, including at signal acquisition points, base ground stations for signal distribution, and client data centers. At signal acquisition points, Brocade's switches can be deployed in ruggedized environments and provide effective and economical operation. At base ground stations, Brocade routers efficiently handle multicast traffic distribution to client data centers. Brocade also offers flexible data center solutions for processing large amounts of ingested data at client sites.
Robust Fault Tolerance in Content Addressable Memory InterfaceIOSRJVSP
With the rapid improvement in data exchange, large memory devices have come out in recent past. The operational controlling for such large memory has became a tedious task due to faster, distributed nature of memory units. In the process of memory accessing it is observed that data written or fetched are often encounter with fault location and faulty data are written or fetched from the addressed locations. In real time applications, this error cannot be tolerated as it leads to variation in the operational condition dependent on the memory data. Hence, It is required to have an optimal controlling fault tolerance in content addressable memory. In this paper, we present an approach of fault tolerance approach by controlling the fault addressing overhead, by introducing a new addressing approach using redundant control modeling of fault address unit. The presented approach achieves the objective of fault controlling over multiple fault location in different dimensions with redundant coding.
Vesyla is a high-level synthesis framework that maps DSP algorithms onto a coarse-grain reconfigurable architecture. It takes untimed C code as input and uses pragmas to guide the mapping and generation of configuration files for the architecture. The pragmas identify parallelism and allocate and bind operations and operands to resources. This allows the user to explore different architectural implementations from serial to fully parallel. Vesyla analyzes dependencies, schedules operations, and synchronizes parallel threads to generate the configuration files.
System on Chip is a an IC that integrates all the components of an electronic system. This presentation is based on the current trends and challenges in the IP based SOC design.
From Rack scale computers to Warehouse scale computersRyousei Takano
This document discusses the transition from rack-scale computers to warehouse-scale computers through the disaggregation of technologies. It provides examples of rack-scale architectures like Open Compute Project and Intel Rack Scale Architecture. For warehouse-scale computers, it examines HP's The Machine project using application-specific cores, universal memory, and photonics fabric. It also outlines UC Berkeley's FireBox project utilizing 1 terabit/sec optical fibers, many-core systems-on-chip, and non-volatile memory modules connected via high-radix photonic switches.
A LIGHT WEIGHT VLSI FRAME WORK FOR HIGHT CIPHER ON FPGAIRJET Journal
This document discusses the implementation of a lightweight VLSI design for the HIGHT cipher on an FPGA. It begins with an introduction to lightweight VLSI architecture and its applications in low-resource devices. It then provides background on the HIGHT cipher and discusses prior work implementing cryptographic algorithms on FPGAs. The document goes on to describe the proposed VLSI design for the HIGHT cipher, which is optimized for size, power, and speed. It achieves a throughput of 25 Mbps with an encryption/decryption delay of 0.64 ms. Evaluation results demonstrate the effectiveness and suitability of the design for low-power applications.
This document presents benchmarks to analyze the memory subsystem performance of multicore processors from AMD and Intel. The benchmarks measure latency and bandwidth for different cache coherence states and locations in the memory hierarchy. Testing was done on dual-socket systems using AMD Opteron 2300 (Shanghai) and Intel Xeon 5500 (Nehalem-EP) quad-core processors. Results show significant performance differences driven by each processor's distinct cache architecture and coherence protocol implementations.
SENSOR SIGNAL PROCESSING USING HIGH-LEVEL SYNTHESIS AND INTERNET OF THINGS WI...pijans
Sensor routers play a crucial role in the sector of Internet of Things applications, in which the capacity for transmission of the network signal is limited from cloud systems to sensors and its reversal process. It describes a robust recognized framework with various architected layers to process data at high level synthesis. It is designed to sense the nodes instinctually with the help of Internet of Things where the applications arise in cloud systems. In this paper embedded PEs with four layer new design framework architecture is proposed to sense the devises of IOT applications with the support of high-level synthesis DBMF (database management function) tool.
The document provides instructions for requesting writing assistance from HelpWriting.net. It outlines a 5-step process: 1) Create an account with a password and email. 2) Complete an order form with instructions, sources, and deadline. 3) Review bids from writers and choose one. 4) Receive the paper and authorize payment if satisfied. 5) Request revisions until fully satisfied, with a refund option for plagiarized work.
Learn How To Tell The TIME Properly In English 7ESLLisa Muthukumar
The document discusses Toyota's recall issues from 2009-2010. Toyota recalled millions of vehicles
due to unintended acceleration problems. The recalls cost Toyota billions of dollars and damaged its
reputation for quality and safety. Federal regulators heavily fined Toyota for its handling of the
recalls.
More Related Content
Similar to A NoC-Based Infrastructure To Enable Dynamic Self Reconfigurable Systems
The design of high performance Digital Signal Processing (DSP) Processors for Software Defined Radio (SDR) with high degree of flexibility and low power consumption has been a major challenge to the
scientific community ever since its conception. The basic philosophy of SDR is to implement different modulation or demodulation schemes on the same underlying hardware. Currently available high performance DSP processors, optimized with ‘Very Large Instruction Word (VLIW)’ architecture and multiply and accumulate (MAC) units, are unable to meet the near real time speed requirements of
Software Defined Radios (SDR) due to their inherent sequential execution of compute intensive signal
processing algorithms. Moreover, their power dissipation is considerably high. Even though, Application Specific Integrated Circuits (ASIC) exhibit high performance, they are also not suitable because of their lack of flexibility. Various references on FPGA based implementations of reconfigurable architectures for SDRs are also available. However, the Look-up Table (LUT) based implementations of FPGAs are not
optimum and therefore, cannot offer highest performance at low silicon cost. Keeping this view, this paper presents the design of a configurable communication processor for Software Defined Radio. The proposed
scheme features the performance of an ASIC based design combined with the flexibility of software. Experimental results reveal that the proposed architecture has minimum hardware requirement, improved silicon area utilization and low power dissipation.
The document provides an introduction to systems approaches and system architecture. It discusses how system architecture has evolved over time to deal with increasing complexity as transistor density has grown exponentially. A system-on-chip architecture combines various processors, memories, and interconnects tailored for a specific application domain. The document then discusses the key components of systems, including different types of processors, memories, and interconnects. It also covers the tradeoffs between hardware and software implementations and different processor architectures used in systems-on-chip.
An octa core processor with shared memory and message-passingeSAT Journals
Abstract This being the era of fast, high performance computing, there is the need of having efficient optimizations in the processor architecture and at the same time in memory hierarchy too. Each and every day, the advancement of applications in communication and multimedia systems are compelling to increase number of cores in the main processor viz., dual-core, quad-core, octa-core and so on. But, for enhancing the overall performance of multi processor chip, there are stringent requirements to improve inter-core synchronization. Thus, a MPSoC with 8-cores supporting both message-passing and shared-memory inter-core communication mechanisms is implemented on Virtex 5 LX110T FPGA. Each core is based on MIPS III (Microprocessor without interlocked pipelined stages) ISA, handling only integer type instructions and having six-stage pipeline with data hazard detection unit and forwarding logic. The eight processing cores and one central shared memory core are inter connected using 3x3 2-D mesh topology based Network-on-chip (NoC) with virtual channel router. The router is four stage pipelined supporting DOR X-Y routing algorithm and with round robin arbitration technique. For verification and functionality test of above fully synthesized multi core processor, matrix multiplication operation is mapped onto the above said. Partitioning and scheduling of multiple multiplications and addition for each element of resultant matrix has been done accordingly among eight cores to get maximum throughput. All the codes for processor design are written in Verilog HDL. Keywords: MPSoC, message-passing, shared memory, MIPS, ISA, wormhole router, network-on-chip, SIMD, data level parallelism, 2-D Mesh, virtual channel
DhkGive Me the proper retort for Use full
It can be very essential for all students
Its honorable request for given Give Me the proper retort for Use full
It can be very essential for all students
Its honorable request for given retGive Me the proper retort for Use full
It can be very essential for all students
Its honorable request for given Give Me the proper retort for Use full
It can be very essential for all students
Its honorable request for given Give Me the proper retort for Use full
It can be very essential for all students
Its honorable request for given Give Me the proper retort for Use full
It can be very essential for all students
Its honorable request for given Give Me the proper retort for Use full
It can be very essential for all students
Give Me the proper retort for Use full
It can be very essential for all students
Its honorable request for given Give Me the proper retort for Use full
It can be very essential for all students
Give Me the proper retort for Use full
It can be very essential for all students
Its honorable request for given Give Me the proper retort for Use full
It can be very essential for all students
Give Me the proper retort for Use full
It can be very essential for all students
Its honorable request for given Give Me the proper retort for Use full
It can be very essential for all students
Its honorable request for given Give Me the proper retort for Use full
It can be very essential for all students
Its honorable request for given retGive Me the proper retort for Use full
It can be very essential for all students
Its honorable request for given Give Me the proper retort for Use full
It can be very essential for all students
Its honorable request for given Give Me the proper retort for Use full
It can be very essential for all students
Its honorable request for given Give Me the proper retort for Use full
It can be very essential for all students
Its honorable request for given Give Me the proper retort for Use full
It can be very essential for all students
Give Me the proper retort for Use full
It can be very essential for all students
Its honorable request for given Give Me the proper retort for Use full
It can be very essential for all students
Give Me the proper retort for Use full
It can be very essential for all students
Its honorable request for given Give Me the proper retort for Use full
It can be very essential for all students
Give Me the proper retort for Use full
It can be very essential for all students
Its honorable request for given Give Me the proper retort for Use full
It can be very essential for all students
Its honorable request for given Give Me the proper retort for Use full
It can be very essential for all students
Its honorable request for given retGive Me the proper retort for Use full
It can be very essential for all students
Its honorable request for given Give Me the proper retort for Use full
It can be very essential for all st
Synergistic processing in cell's multicore architectureMichael Gschwind
The document discusses the Cell Broadband Engine architecture, which was designed to improve performance over desktop systems by an order of magnitude. It has a heterogeneous multi-chip design with one Power processor element for control tasks and eight synergistic processor units for data processing. The SPU architecture implements a novel pervasively data-parallel approach that combines scalar and SIMD processing on wide data paths to improve efficiency. This enables more processing cores to fit on a chip for high thread-level parallelism.
Designing of telecommand system using system on chip soc for spacecraft contr...IAEME Publication
The emerging developments in semiconductor technology have made possible to design
entire system onto a single chip, commonly known as System-On-Chip (SoC). The increase in Space
System‘s capabilities by the On-board data processing capabilities can be overcome by optimizing
the SoCs to provide cost effective, high performance, and reliable data. This is achieved by
embedding pre-designed functions into a single SoC, which utilizes specialized reusable core (IP
cores) architecture into complex chip. This paper is concerned with the design of Telecommand
system for transfer of signals from ground station to space station by the integration of SRAM (Static
Random Access Memory), ARM (Advanced RISC Machine) Processor, EDAC unit (Error Detection
and Correction) and CCSDS (Consultative Committee for Space Data System) decoder system. In
this paper we designed the Telecommand SoC by using Verilog code. The implementations have
been done using XILINX FPGA platform and the functionality of the system is verified using
Modelsim simulation. The results are analyzed for SPARTAN 3E device and ARM board and two
devices are being controlled by the signal transfer.
Designing of telecommand system using system on chip soc for spacecraft contr...IAEME Publication
This document describes the design of a telecommand system using a System on Chip (SoC) for spacecraft control applications. It involves integrating various components onto a single chip, including SRAM, an ARM processor, and an Error Detection and Correction (EDAC) unit. The telecommand data is received and stored in the SRAM. The EDAC unit uses a Hamming code to detect and correct any errors in the data before it is processed by the ARM processor. The processor then collects onboard data signals and produces the output result. Verilog code is used to design the SoC, which is implemented and tested on a Xilinx FPGA platform and ARM board. The SoC allows two devices to be controlled by
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
HOMOGENEOUS MULTISTAGE ARCHITECTURE FOR REAL-TIME IMAGE PROCESSINGcscpconf
The document describes a homogeneous multistage architecture for real-time image processing. It proposes a parallel architecture using multiple identical processing elements connected by different communication links. As an example application, it discusses a multi-hypothesis approach for road recognition, which uses multiple hypotheses to detect and track road edges in video in real-time. Experimental results using a FPGA demonstrate the architecture can detect roadsides in images within 60 milliseconds.
DYNAMIC HW PRIORITY QUEUE BASED SCHEDULERS FOR EMBEDDED SYSTEMijesajournal
A real-time operating system (RTOs) is often used in embedded system, to structure the application code
and to ensure that the deadlines are met by reacting on events by executing the functions within precise
time. Most embedded systems are bound to real-time constraints with determinism and latency as a critical
metrics. RTOs are generally implemented in software, increases computational overheads, jitter and
memory footprint. Modern FPGA technology, enables the implementation of a full featured and flexible
hardware based RTOs, which helps in reducing to greater extent these overheads even if not remove
completely. Scheduling algorithms play an important role in the design of real-time systems. An Adaptive
Fuzzy Inference System (FIS) based scheduler framework proposed in this article is based on the study and
conclusion drawn from the research over the years in HW SW co-design domain. The proposed novel two
phase FIS based adaptive hardware task scheduler minimizes the processor time for scheduling activity
which uses fuzzy logic to model the uncertainty at first stage along with adaptive framework that uses
feedback which allows processors share of task running on multiprocessor to be controlled dynamically at
runtime. This Fuzzy logic based adaptive hardware scheduler breakthroughs the limit of the number of
total task and thus improves efficiency of the entire real-time system. The increased computation overheads
resulted from proposed two phase FIS scheduler can be compensated by utilising the basic characteristics
of parallelism of the hardware as scheduler being migrated to FPGA.
Dynamic HW Priority Queue Based Schedulers for Embedded System[ijesajournal
A real-time operating system (RTOs) is often used in embedded system, to structure the application code
and to ensure that the deadlines are met by reacting on events by executing the functions within precise
time. Most embedded systems are bound to real-time constraints with determinism and latency as a critical
metrics. RTOs are generally implemented in software, increases computational overheads, jitter and
memory footprint. Modern FPGA technology, enables the implementation of a full featured and flexible
hardware based RTOs, which helps in reducing to greater extent these overheads even if not remove
completely. Scheduling algorithms play an important role in the design of real-time systems. An Adaptive
Fuzzy Inference System (FIS) based scheduler framework proposed in this article is based on the study and
conclusion drawn from the research over the years in HW SW co-design domain. The proposed novel two
phase FIS based adaptive hardware task scheduler minimizes the processor time for scheduling activity
which uses fuzzy logic to model the uncertainty at first stage along with adaptive framework that uses
feedback which allows processors share of task running on multiprocessor to be controlled dynamically at
runtime. This Fuzzy logic based adaptive hardware scheduler breakthroughs the limit of the number of
total task and thus improves efficiency of the entire real-time system. The increased computation overheads
resulted from proposed two phase FIS scheduler can be compensated by utilising the basic characteristics
of parallelism of the hardware as scheduler being migrated to FPGA.
The document discusses Brocade's network solutions for intelligence, surveillance, and reconnaissance (ISR) systems. Brocade provides solutions throughout the ISR architecture, including at signal acquisition points, base ground stations for signal distribution, and client data centers. At signal acquisition points, Brocade's switches can be deployed in ruggedized environments and provide effective and economical operation. At base ground stations, Brocade routers efficiently handle multicast traffic distribution to client data centers. Brocade also offers flexible data center solutions for processing large amounts of ingested data at client sites.
Robust Fault Tolerance in Content Addressable Memory InterfaceIOSRJVSP
With the rapid improvement in data exchange, large memory devices have come out in recent past. The operational controlling for such large memory has became a tedious task due to faster, distributed nature of memory units. In the process of memory accessing it is observed that data written or fetched are often encounter with fault location and faulty data are written or fetched from the addressed locations. In real time applications, this error cannot be tolerated as it leads to variation in the operational condition dependent on the memory data. Hence, It is required to have an optimal controlling fault tolerance in content addressable memory. In this paper, we present an approach of fault tolerance approach by controlling the fault addressing overhead, by introducing a new addressing approach using redundant control modeling of fault address unit. The presented approach achieves the objective of fault controlling over multiple fault location in different dimensions with redundant coding.
Vesyla is a high-level synthesis framework that maps DSP algorithms onto a coarse-grain reconfigurable architecture. It takes untimed C code as input and uses pragmas to guide the mapping and generation of configuration files for the architecture. The pragmas identify parallelism and allocate and bind operations and operands to resources. This allows the user to explore different architectural implementations from serial to fully parallel. Vesyla analyzes dependencies, schedules operations, and synchronizes parallel threads to generate the configuration files.
System on Chip is a an IC that integrates all the components of an electronic system. This presentation is based on the current trends and challenges in the IP based SOC design.
From Rack scale computers to Warehouse scale computersRyousei Takano
This document discusses the transition from rack-scale computers to warehouse-scale computers through the disaggregation of technologies. It provides examples of rack-scale architectures like Open Compute Project and Intel Rack Scale Architecture. For warehouse-scale computers, it examines HP's The Machine project using application-specific cores, universal memory, and photonics fabric. It also outlines UC Berkeley's FireBox project utilizing 1 terabit/sec optical fibers, many-core systems-on-chip, and non-volatile memory modules connected via high-radix photonic switches.
A LIGHT WEIGHT VLSI FRAME WORK FOR HIGHT CIPHER ON FPGAIRJET Journal
This document discusses the implementation of a lightweight VLSI design for the HIGHT cipher on an FPGA. It begins with an introduction to lightweight VLSI architecture and its applications in low-resource devices. It then provides background on the HIGHT cipher and discusses prior work implementing cryptographic algorithms on FPGAs. The document goes on to describe the proposed VLSI design for the HIGHT cipher, which is optimized for size, power, and speed. It achieves a throughput of 25 Mbps with an encryption/decryption delay of 0.64 ms. Evaluation results demonstrate the effectiveness and suitability of the design for low-power applications.
This document presents benchmarks to analyze the memory subsystem performance of multicore processors from AMD and Intel. The benchmarks measure latency and bandwidth for different cache coherence states and locations in the memory hierarchy. Testing was done on dual-socket systems using AMD Opteron 2300 (Shanghai) and Intel Xeon 5500 (Nehalem-EP) quad-core processors. Results show significant performance differences driven by each processor's distinct cache architecture and coherence protocol implementations.
SENSOR SIGNAL PROCESSING USING HIGH-LEVEL SYNTHESIS AND INTERNET OF THINGS WI...pijans
Sensor routers play a crucial role in the sector of Internet of Things applications, in which the capacity for transmission of the network signal is limited from cloud systems to sensors and its reversal process. It describes a robust recognized framework with various architected layers to process data at high level synthesis. It is designed to sense the nodes instinctually with the help of Internet of Things where the applications arise in cloud systems. In this paper embedded PEs with four layer new design framework architecture is proposed to sense the devises of IOT applications with the support of high-level synthesis DBMF (database management function) tool.
Similar to A NoC-Based Infrastructure To Enable Dynamic Self Reconfigurable Systems (20)
The document provides instructions for requesting writing assistance from HelpWriting.net. It outlines a 5-step process: 1) Create an account with a password and email. 2) Complete an order form with instructions, sources, and deadline. 3) Review bids from writers and choose one. 4) Receive the paper and authorize payment if satisfied. 5) Request revisions until fully satisfied, with a refund option for plagiarized work.
Learn How To Tell The TIME Properly In English 7ESLLisa Muthukumar
The document discusses Toyota's recall issues from 2009-2010. Toyota recalled millions of vehicles
due to unintended acceleration problems. The recalls cost Toyota billions of dollars and damaged its
reputation for quality and safety. Federal regulators heavily fined Toyota for its handling of the
recalls.
Where To Buy College Essay Of The Highest QualityLisa Muthukumar
This document provides instructions for purchasing a high-quality college essay from HelpWriting.net. It outlines a 5-step process: 1) Create an account with a password and email. 2) Complete a 10-minute order form providing instructions, sources, and deadline. 3) Choose a bid from writers based on qualifications. 4) Review the paper and authorize payment if pleased. 5) Request revisions to ensure satisfaction, with a full refund option for plagiarism. The purpose is to guide customers through purchasing original, customized essays from qualified writers on the site.
What Is A Reaction Essay. Reaction Essay. 2019-01-24Lisa Muthukumar
The document provides instructions for using the HelpWriting.net service to have papers written. It outlines a 5-step process: 1) Create an account with a password and email. 2) Complete an order form with instructions, sources, and deadline. 3) Review bids from writers and choose one. 4) Review the completed paper and authorize payment. 5) Request revisions until satisfied with the paper. It emphasizes that original, high-quality content is guaranteed or a full refund will be provided.
Writing A Strong Essay. Paper Rater Writing A Strong Essay. 2022-11-05.pdf Wr...Lisa Muthukumar
The Real Irish Republican Army (RIRA) broke away from the IRA in 1997 because the IRA called a ceasefire, while the RIRA wanted to continue fighting. The RIRA aims to remove British control of Northern Ireland and unify Ireland. It operates out of Northern Ireland, the Irish Republic, Britain, and parts of Europe, with several hundred members. The RIRA remains dedicated to its goals through militant and sometimes violent means.
List Of Transitional Words For Writing Essays. Online assignment writing serv...Lisa Muthukumar
The document provides a 5-step process for requesting and receiving writing assistance from HelpWriting.net. It outlines the registration, order placement, bidding, review, and revision steps. Customers can request revisions until satisfied with the original, plagiarism-free content provided by HelpWriting writers.
Galileo Galilei was an influential Italian scientist during the 1500s-1600s who made important contributions to the fields of motion, astronomy, and material strength classification. He accepted Copernicus' heliocentric model of the solar system and helped convert natural philosophy from a verbal to a more mathematical and experiment-based approach. The document provides biographical details about Galileo's background and career accomplishments as a philosopher, astronomer, and mathematician who helped advance scientific thought and methods.
How To Start Writing Essay About Yourself. How To StaLisa Muthukumar
PageNors Investments was founded in 2006 in Madison, Wisconsin by Sally Page and Jackson
Norstern to provide investment services to individuals in the Madison area. The firm aims to serve
clients in Madison and the surrounding communities of Dane County. Their website provides an
overview of the company's origins and services focused on the local market in Madison.
Writing A Summary In 3 Steps Summary Writing, TLisa Muthukumar
Penguins have thick layers of fat and feathers to insulate them from the extreme cold. Seals have blubber layers and dense fur to retain heat. Both penguins and seals have streamlined bodies for efficient swimming in icy waters. Antarctic krill and fish survive by producing antifreeze proteins in their blood that prevent freezing.
012 College Application Essay Examples About YoLisa Muthukumar
This document provides instructions for requesting writing assistance from HelpWriting.net. It outlines a 5-step process: 1) Create an account with a password and email. 2) Complete a 10-minute order form providing instructions, sources, deadline. 3) Review bids from writers and choose one based on qualifications. 4) Receive the paper and ensure it meets expectations before authorizing payment. 5) Request revisions to ensure satisfaction, and HelpWriting.net offers refunds for plagiarized work. The document explains how to obtain high-quality, original content through their writing assistance service.
Thanksgiving Writing Paper Free Printable - Printable FLisa Muthukumar
The document discusses a biology lab experiment on the effect of osmosis on potato cells placed in sucrose solutions of varying concentrations. Potato slices were placed in 0.0M, 0.1M, 0.2M, 0.3M, 0.4M, and 0.5M sucrose solutions for one hour. Their initial and final masses were measured and compared. In lower concentrations, the potato mass increased due to water entering the cells. In higher concentrations, the potato mass decreased as water left the cells. The results support the hypothesis that different sucrose concentrations impact potato cell mass through osmosis.
PPT - English Essay Writing Help Online PowerPoint Presentation, FreLisa Muthukumar
This document outlines a 5-step proposal for Sterling Marking Products to enter the UK market. It recommends establishing a wholly owned subsidiary to have full control over operations. Some key points:
1. Establish a wholly owned UK subsidiary to have full control over operations and avoid issues from current partnerships where control is shared.
2. Hire a general manager to oversee the subsidiary and hire local staff.
3. Use the subsidiary to target both large retailers and smaller shops to gain market share.
4. Consider Canada for future expansion but be cautious of strain on resources from expanding too quickly internationally.
5. Monitor subsidiary closely and reevaluate strategy after 3 years of operations.
Free Lined Writing Paper Printable - Printable TemplLisa Muthukumar
The document discusses the steps involved in requesting and receiving writing assistance from HelpWriting.net, including creating an account, submitting a request form with instructions and deadlines, reviewing bids from writers, authorizing payment upon approval of the completed paper, and having the option to request revisions. The process utilizes a bidding system where the requester can choose a writer based on qualifications and feedback to work on their assignment. HelpWriting.net promises original, high-quality content and refunds for plagiarized work.
Descriptive Essay College Writing Tutor. Online assignment writing service.Lisa Muthukumar
The document discusses the key steps involved in obtaining college writing assistance from HelpWriting.net:
1. Create an account with a password and email.
2. Complete a 10-minute order form providing instructions, sources, deadline and sample work.
3. Choose a writer based on their bid, qualifications, history and feedback to start the assignment.
4. Review the completed paper and authorize payment or request free revisions until satisfied.
9 Best Images Of Journal Writing Pap. Online assignment writing service.Lisa Muthukumar
The document provides instructions for creating an account and submitting assignment requests on the HelpWriting.net site. It explains the 5-step process: 1) Create an account with email and password. 2) Complete a form with assignment details. 3) Writers will bid on the request and the customer can choose a writer. 4) The customer receives the paper and can request revisions if needed. 5) HelpWriting guarantees original, high-quality work and refunds are offered for plagiarized content.
Divorce Agreement Template - Fill Out, Sign Online AndLisa Muthukumar
The document discusses the steps to request writing assistance from HelpWriting.net, including creating an account, completing an order form, and reviewing writer bids before selecting a writer and placing a deposit to start the assignment. The process also allows customers to request revisions and receives a refund if plagiarism is found. HelpWriting.net aims to fully meet customer needs for original, high-quality content.
Diversity On Campus Essay - Mfawriting515.Web.Fc2.ComLisa Muthukumar
This document provides instructions for creating an account on the website HelpWriting.net in order to request assistance with writing assignments. The process involves 5 steps: 1) Creating an account with a password and email, 2) Completing an order form with instructions and deadline, 3) Choosing a writer based on their bid, qualifications, and reviews, 4) Reviewing the completed paper and authorizing payment, 5) Requesting revisions until satisfied with the work. The website promises original, high-quality content and refunds for plagiarized work.
The document provides instructions for requesting an assignment writing service from HelpWriting.net, including creating an account, completing an order form with instructions and deadline, and reviewing writer bids before authorizing payment upon satisfactory completion of revisions. The process aims to ensure high-quality original content through a bidding system and free revisions until the customer is satisfied with the final work.
College Admission Essays That Worked - The OscillatiLisa Muthukumar
This document discusses the benefits of aerial photography for real estate properties. It states that an aerial view provides a unique perspective that gives buyers a better understanding of the land. This includes seeing the interior, exterior, and surrounding areas from various angles. Hiring an experienced professional for aerial photography can increase the property's resale value by effectively showcasing these features. An amateur photographer may degrade the value by failing to capture the right shots from above. Overall, aerial views are an important part of real estate photography and videography.
5 Steps To Write A Good Essay REssaysondemandLisa Muthukumar
Here is a 150-word introduction for the manuscript:
The tumor suppressor protein p53 plays a crucial role in regulating the cell cycle and preventing uncontrolled cell proliferation. In cells where p53 is absent or mutated, this loss of growth regulation allows tumors to form and progress. Traditionally, p53 was thought to suppress tumors through cell-autonomous mechanisms, inducing cell cycle arrest or apoptosis in cells experiencing DNA damage or other stresses. However, recent evidence suggests p53 can also exert non-cell autonomous tumor suppressive effects. This manuscript aims to review the emerging evidence that p53 induces cellular senescence in neighboring cells, which activates a senescence-associated secretory phenotype that influences the tumor microenvironment and limits tumor growth in
🔥🔥🔥🔥🔥🔥🔥🔥🔥
إضغ بين إيديكم من أقوى الملازم التي صممتها
ملزمة تشريح الجهاز الهيكلي (نظري 3)
💀💀💀💀💀💀💀💀💀💀
تتميز هذهِ الملزمة بعِدة مُميزات :
1- مُترجمة ترجمة تُناسب جميع المستويات
2- تحتوي على 78 رسم توضيحي لكل كلمة موجودة بالملزمة (لكل كلمة !!!!)
#فهم_ماكو_درخ
3- دقة الكتابة والصور عالية جداً جداً جداً
4- هُنالك بعض المعلومات تم توضيحها بشكل تفصيلي جداً (تُعتبر لدى الطالب أو الطالبة بإنها معلومات مُبهمة ومع ذلك تم توضيح هذهِ المعلومات المُبهمة بشكل تفصيلي جداً
5- الملزمة تشرح نفسها ب نفسها بس تكلك تعال اقراني
6- تحتوي الملزمة في اول سلايد على خارطة تتضمن جميع تفرُعات معلومات الجهاز الهيكلي المذكورة في هذهِ الملزمة
واخيراً هذهِ الملزمة حلالٌ عليكم وإتمنى منكم إن تدعولي بالخير والصحة والعافية فقط
كل التوفيق زملائي وزميلاتي ، زميلكم محمد الذهبي 💊💊
🔥🔥🔥🔥🔥🔥🔥🔥🔥
Leveraging Generative AI to Drive Nonprofit InnovationTechSoup
In this webinar, participants learned how to utilize Generative AI to streamline operations and elevate member engagement. Amazon Web Service experts provided a customer specific use cases and dived into low/no-code tools that are quick and easy to deploy through Amazon Web Service (AWS.)
Beyond Degrees - Empowering the Workforce in the Context of Skills-First.pptxEduSkills OECD
Iván Bornacelly, Policy Analyst at the OECD Centre for Skills, OECD, presents at the webinar 'Tackling job market gaps with a skills-first approach' on 12 June 2024
A Visual Guide to 1 Samuel | A Tale of Two HeartsSteve Thomason
These slides walk through the story of 1 Samuel. Samuel is the last judge of Israel. The people reject God and want a king. Saul is anointed as the first king, but he is not a good king. David, the shepherd boy is anointed and Saul is envious of him. David shows honor while Saul continues to self destruct.
How Barcodes Can Be Leveraged Within Odoo 17Celine George
In this presentation, we will explore how barcodes can be leveraged within Odoo 17 to streamline our manufacturing processes. We will cover the configuration steps, how to utilize barcodes in different manufacturing scenarios, and the overall benefits of implementing this technology.
Gender and Mental Health - Counselling and Family Therapy Applications and In...PsychoTech Services
A proprietary approach developed by bringing together the best of learning theories from Psychology, design principles from the world of visualization, and pedagogical methods from over a decade of training experience, that enables you to: Learn better, faster!
Temple of Asclepius in Thrace. Excavation resultsKrassimira Luka
The temple and the sanctuary around were dedicated to Asklepios Zmidrenus. This name has been known since 1875 when an inscription dedicated to him was discovered in Rome. The inscription is dated in 227 AD and was left by soldiers originating from the city of Philippopolis (modern Plovdiv).
Level 3 NCEA - NZ: A Nation In the Making 1872 - 1900 SML.pptHenry Hollis
The History of NZ 1870-1900.
Making of a Nation.
From the NZ Wars to Liberals,
Richard Seddon, George Grey,
Social Laboratory, New Zealand,
Confiscations, Kotahitanga, Kingitanga, Parliament, Suffrage, Repudiation, Economic Change, Agriculture, Gold Mining, Timber, Flax, Sheep, Dairying,
This document provides an overview of wound healing, its functions, stages, mechanisms, factors affecting it, and complications.
A wound is a break in the integrity of the skin or tissues, which may be associated with disruption of the structure and function.
Healing is the body’s response to injury in an attempt to restore normal structure and functions.
Healing can occur in two ways: Regeneration and Repair
There are 4 phases of wound healing: hemostasis, inflammation, proliferation, and remodeling. This document also describes the mechanism of wound healing. Factors that affect healing include infection, uncontrolled diabetes, poor nutrition, age, anemia, the presence of foreign bodies, etc.
Complications of wound healing like infection, hyperpigmentation of scar, contractures, and keloid formation.
A NoC-Based Infrastructure To Enable Dynamic Self Reconfigurable Systems
1. A NoC-based Infrastructure to Enable Dynamic Self Reconfigurable Systems
Leandro Möller1
, Ismael Grehs2
, Ewerson Carvalho2
, Rafael Soares2
, Ney Calazans2
, Fernando Moraes2
1
Darmstadt University of Technology – Institute of Microelectronic Systems
Karlstr. 15, 64283 Darmstadt, Germany
moller@mes.tu-darmstadt.de
2
Catholic University of Rio Grande do Sul (FACIN-PUCRS)
Av. Ipiranga, 6681 - Prédio 16 - 90619-900 - Porto Alegre - RS - BRASIL
{grehs, ecarvalho, rsoares, calazans, moraes}@inf.pucrs.br
Abstract
Electronic equipments with higher performance, lower
power consumption, and smaller size motivate the research
for more efficient design methods. Platform-based design is
a method to implement complex SoCs that avoids design
from scratch. Usually, a platform-based designed SoC
includes one or more processors, a real-time operating
system, intellectual property (IP) blocks, memories and an
interconnection infrastructure. An associated advantage of
processor is flexibility at the software level. Hardware is
not flexible. Thus, dedicated IP blocks must be inserted at
design time. An alternative is to provide the platform with
reconfigurable hardware blocks with sufficient capacity to
implement any envisaged dedicated IP block. Dynamic self-
reconfigurable systems (DSRSs) introduce flexibility to
hardware. In DSRSs, IP blocks are loaded according to
application demand, an approach that potentially reduces
area, power consumption and total system cost.
1. Introduction
Platform-based design [1] is a method to implement
complex SoCs, avoiding chip design from scratch. Several
IPs other than processors compose SoCs. Examples are
communication interfaces, memory controllers and
hardware accelerators. These IPs as well as processor may
be implemented directly in silicon or using reconfigurable
hardware technology. Using the second option, it becomes
possible to: (i) improve system performance, by migrating
critical tasks to hardware; (ii) build products in smaller
devices, thus reducing costs; (iii) extend product life cycle;
(iv) update hardware after system manufacturing.
In order to accomplish (i) and (ii), reconfigurable
hardware must allow partial and dynamic reconfiguration.
Systems using these characteristics are called Dynamically
Reconfigurable Systems (DRSs). The main drawback of
DRSs is their reconfiguration time. To minimize this
drawback, DRSs may be built with the capacity to manage
their own reconfiguration process. This can be achieved
through the availability of internal reconfiguration ports.
Such systems are named Dynamic Self-Reconfigurable
Systems (DSRSs) [2]. DSRSs are the target architecture of
this work.
One natural implementation choice for DSRSs are
dedicated ASICs, with embedded reconfigurable areas. As
the goal of this paper is to propose an infrastructure for
DSRS, fine-grain reconfigurable FPGAs are used here as a
device platform for proof-of-concept purposes. Current
FPGAs are clearly limited in terms of useful silicon area,
since most of the silicon area is used for programming
purposes. In addition, DSRSs may waste a significant
amount of this useful silicon to implement the necessary
infrastructure. Despite these drawbacks, FPGAs are
certainly adequate to prototype the infrastructure proposed
herein, serving to demonstrate its benefits, gains and
limitations.
An important issue in current SoC design is the
implementation of its communication infrastructure.
Present SoCs require using scalable communication
infrastructures, with shorter wires to minimize power
consumption [3]. Networks on chip (NoCs) are an
alternative to busses, with several advantages, as stated in
[4]. However, few works [5] have suggested mixing
reconfigurable IPs and NoCs.
This paper has four goals. First, to propose an
infrastructure for DSRSs, identifying which are its required
components. The second goal is to present a
straightforward design flow supporting DSRSs. The third
goal is to describe a NoC actively supporting the process of
partial and dynamic IP reconfiguration. The last goal is to
depict proof-of-concept case studies, comparing area
overhead and reconfiguration time.
The rest of this paper is organized as follows. A
discussion about DSRS implementation alternatives is the
subject of Section 2. Section 3 presents the Artemis NoC
architecture. Section 4 presents a practical design flow to
build DSRSs. Section 5 presents and compares two DSRS
case studies. Finally, Section 6 presents some conclusions
and directions for future work.
2. DSRS Infrastructure
2. This Section discusses choices and trade-offs associated
to the DSRS infrastructure, making a parallel with existent
works and recommending implementation choices for each
internal component. Figure 1 depicts these components in a
DSRS conceptual architecture. The communication
infrastructure is presented in Section 3.
Reconfigurable
Interface
Reconfigurable
Interface
Repository
Configuration
Port
...
Fixed
IP
Fixed
IP
...
Reconfigurable
IP
Reconfigurable
IP
Communication
Infrastructure
...
Fixed SoC area
SoC
Reconfigurable SoC area
Configuration
Controller
Figure 1 - DSRS conceptual architecture.
2.1. Repositories
DSRSs need to have access to repositories able to store a
potentially large number of partial configurations, often
called configuration memory. Besides stocking partial
configurations, these repositories should offer fast access to
its contents, to satisfy application requirements. There are
basically four device types available to use as configuration
memories: (i) memory internal to the reconfigurable
device, usually available as RAM blocks or BRAMs; (ii)
devices external to the DSRS using static RAM
technology, or SRAMs; (iii) devices external to the DSRS
using PROM technology, such as EPROM or Flash devices
called generically PROMs; (iv) devices external to the
DSRS using DRAM technology, such as SDRAM.
Applications using BRAMs as repository may support
small number of configurations and/or only small
configurations, due to its limited capacity. Applications
that benefit from difference-based [6] reconfiguration
techniques are among those able to employ this kind of
repository.
SRAM and DRAM devices present a good compromise
between access speed and storage capacity. The former
imply simpler controllers added to the DSRSs, but are
much more expensive per bit than DRAMs. DRAMs, on
the other hand, have a low cost per storage bit, allowing
storing more configurations. However, a higher area of the
DSRS must be committed to implement its controller.
Contrary to the other three technologies, PROMs have
the advantage of keeping configurations after turning the
DSRS off. They cost more per bit than DRAMs, but imply
a simpler procedure at startup of the DSRS. Also, changing
the contents of the repository is more complicated than
with the other technologies.
2.2. Reconfigurable Interface
A reconfigurable interface is necessary to implement the
communication between a reconfigurable IP and the rest of
the DSRS. The interface proposed by Palma et al. [7] uses
two levels of tristate buffers in the input and output pins of
the reconfigurable IPs. One level of tristates belongs to the
reconfigurable IP and the other to the communication
infrastructure. Manual routing verification and manual
routing corrections are required, to ensure correct
connection between IPs. To reduce manual routing, Palma
employs a 1-bit data serial bus as communication
infrastructure.
Lim and Peattie propose a reconfigurable interface
called Bus Macro [6], which employs 8 tristate buffers.
Each macro allows the simultaneous exchange of 4 bits
between a reconfigurable area and another area, fixed or
reconfigurable. The advantage of this macro is that it
reduces manual routing. However, it also uses tristate
buffers, which are scarce resources in Xilinx FPGAs. The
use of such resources overconstrains designs with complex
reconfigurable interfaces.
Huebner et al. [8] propose a reconfigurable interface
called Bus Macro (distinct from the Xilinx Bus Macro, and
herein named Huebner macro). This macro is a static bus
used to connect all reconfigurable IPs of the system. This
reconfigurable interface is composed by two unidirectional
busses, one to communicate the reconfigurable area with
the fixed area and another to communicate in the inverse
direction. Each macro allows the simultaneous
transmission of 8 bits from a reconfigurable area to another
area, fixed or reconfigurable.
2.3. Configuration Ports
The external JTAG and SelectMap interfaces are
alternatives for implementing configuration ports for DRSs
that are not self-reconfigurable, where the configuration
controller is located outside the DRS. Although these
interfaces can be used for building DSRS (using external
wiring connecting some of the reconfigurable device pins
to them) most Xilinx devices have available an Interal
Configuration Access Port (ICAP). The ICAP usually
constitutes the best choice for building DSRSs, since user
logic can reach it from inside the reconfigurable device.
2.4. Configuration Controller
The Authors of this paper have built two versions of
Configuration Controller (CC): (i) a pure hardware version
(CC-H); (ii) a mostly software version (CC-S). Table 1
compares these two implementations qualitatively.
CC-S is three times slower than the CC-H. This
disadvantage is related to the inefficiency of the current
API furnished by Xilinx to give access to ICAP. This API
requires the CC-S to fetch 512-word blocks of each partial
configuration and store these in a BRAM. Only after
caching these data, the API sends configuration data to the
ICAP. The CC-H sends data directly from an external
memory to ICAP, leading to smaller reconfiguration time.
3. CC-S runs on an embedded 32-bit RISC processor
designed by Xilinx, MicroBlaze. The structure of CC-S
also includes peripheral device controllers, memory and a
communication infrastructure. If configuration control is
the only task assigned to this infrastructure, the approach
could hardly be justified. However, assuming that most
applications today require the use of one or more
processors inside the system, and assuming some of these
processors have spare time to perform the CC tasks, the
additional hardware for configuration control requires less
area than CC-H. Given the assumptions above and if the
application reconfiguration time requirements are not too
stringent, CC-S can be usefully applied.
Table 1 – Comparison of two CC implementations.
Characteristic CC-H CC-S
Configuration
Speed
Milliseconds Milliseconds
Area
Requires
additional
hardware
If processor available,
small area overhead
(ICAP and macro controllers)
Modification
easiness
Complex /
extra area
Simple / modifying software
Another important aspect regarding the design of CCs is
the easiness for updating/adapting the CC to different
applications. When it is necessary to include additional
functionalities to the CC, a software implementation is
definitely more adequate. Complex tasks can be easily
implemented through programming. Examples of such
functionalities are configuration compression and on-the-
fly decompression, on-the-fly decryption, configuration
scheduling policies, and support to configuration
preemption. A hardware-only implementation such as CC-
H would require restructuring the CC design, realizing the
CC re-synthesis and would probably increase the area
overhead of the controller.
2.5. DSRS Infrastructure
Table 2 presents some recommended infrastructure
choices for DSRSs. Software configuration controllers
allow greater flexibility. It is possible to overcome its
higher reconfiguration time disadvantage by rewriting the
API to access the ICAP module, or by adding a small
hardware module to directly manage ICAP.
Table 2 - DSRSs recommended infrastructure.
Infrastructure Element Recommended Choice
Configuration Controller Software
Reconfigurable Interface LUT-Macro
Repository External SRAM
Reconfigurable Port ICAP
Communication Infrastructure NoC
A recommended choice for the reconfigurable interface
is to use LUT-macros. Macros developed by Xilinx [6] use
a larger area when compared to the LUT-macros proposed
in current work (Section 3.2). The Xilinx Bus Macro
consumes CLBs from 6 distinct CLB columns, being two
in the fixed area and four in the reconfigurable area.
Meanwhile, LUT-macros occupy CLBs of only two CLB
columns, one at the fixed area and one at the reconfigurable
area. Another difference is the number of bits transported
by each macro: a Xilinx Bus Macro is 4-bit wide and LUT-
macro allows 8-bit wide transfers. CLB columns used for
both macros have reduced usability, due to placement and
routing restrictions imposed by the macros on both fixed
and reconfigurable areas [6].
Another recommendation is to use external static RAM
to store partial configurations, since the controller to access
these memories is very simple, present a small access time,
and the capacity of such memories is sufficient to store
several partial configurations. It is not advisable to waste
internal FPGA memory with partial configurations, since
the capacity of such memories is too small.
3. Artemis NoC
The last component of the proposed DSRS infrastructure
discussed here is the communication infrastructure. As
stated before, NoCs are good choices due to their
scalability, increased parallelism and short-range wires that
reduce power consumption. This work proposes Artemis, a
NoC that supports specific reconfiguration services and is
based in the Hermes NoC [9]. This Section describes the
modifications carried out in Hermes to allow its use in
DSRSs.
The partial reconfiguration process may produce
glitches in the interface between the IP under
reconfiguration and the rest of the device. These glitches
may introduce spurious data into the NoC, causing
malfunctions or even circuit blocking. In addition, packets
transmitted to an area suffering reconfiguration, must be
discarded, since it is typically impossible to know if these
packets are targeted to the previous configuration in this
area or to the next reconfiguration. To avoid such
problems, a set of services must be added to the NoC to
enable its use in DSRSs.
Three services are implemented in Artemis: (i)
reconfigurable area insulation; (ii) packet discarding; (iii)
reconfigurable area reconnection. Hermes passed through
the addition of two functionalities to support these services:
(i) definition of control packets, enabling IPs to send
packets to routers, not only to other IPs; (ii) capacity to
disconnect/connect routers from its associated
reconfigurable area. These functionalities are detailed in
the next Sections.
3.1. Control packets: structure and function
The addition of two sideband signals per port to the
original Hermes router serves to differentiate control
packets from data packets. These signals, depicted in
Figure 2, are ctrl_in and ctrl_out. For each flit sent by
data_out, the ctrl_out is asserted together with tx if the flit
is a control packet. The target router receives flits
4. analogously, using data_in, rx and ctrl_in signals.
When the reconfigurable area is insulated, the router
discards any data packets sent to the area under
reconfiguration. Insulation also protects the network, since
during reconfiguration transients can occur in the
reconfigurable interface. If such signals are considered,
spurious data may enter the NoC. Transients were indeed
observed in hardware by measuring the router-IP interface
with a logic analyzer during reconfiguration. These events
may signal a false packet to the router, with unpredictable
outcomes. Once the new IP is configured, a control packet
reconnects IP and router, enabling normal operation.
Router
East
Port
ctrl_out
ctrl_in
tx
rx
data_out
data_in
ack_tx
ack_rx
Router
West
Port
ctrl_in
ctrl_out
rx
tx
data_in
data_out
ack_rx
ack_tx
Figure 2 – Interface between Artemis routers.
The reception and forwarding of control and data
packets are similar. The major change in the router is the
addition of one bit at each position of the input buffer. This
is required to propagate the value of the ctrl_out signal to
the reconfigurable IP router. When the control packet
arrives at its destination router, it decodes and executes the
corresponding operation.
3.2. Reconfigurable IP to router interface
This work proposes a new reconfigurable interface that
does not impose the use of a specific communication
infrastructure. This interface uses LUTs. Two
unidirectional macros compose the reconfigurable
interface, as depicted in Figure 3. The first one, named
F2R, is responsible to send data from the fixed part of the
system to a reconfigurable IP, while the second one, named
R2F, implements the communication in the inverse
direction. Both macros allow the simultaneous transmission
of 8 data bits. The F2R macro is an identity function, while
the R2F uses a special logic to avoid transient glitches
during the reconfiguration process from reconfigurable to
fixed areas.
Macro F2R
CLB
LUT
LUT
in out
in out
Fixed
Area
Reconfigurable
Area
CLB
8
in
8
out
LUTs configured
with the identity
function
(out in)
Slice
Macro R2F
CLB
LUT
LUT
out
in
out
in
Fixed
Area
Reconfigurable
Area
CLB
8
out
8
in
LUTs configured as two
input AND gate
(out in AND control)
Slice
8
control
(a) (b)
control
control
Figure 3 – Proposed macros: (a) F2R; (b) R2F.
The complete interface between the Artemis router and a
reconfigurable IP appears in Figure 4. It uses two R2F
macros to connect 10 bits from right to left and two F2R
macros to connect 11 bits in the inverse direction. The
interface between the router and the reconfigurable IP does
not contain the ctrl_in and ctrl_out signals because
reconfigurable IPs neither send nor receive control packets.
The reset is a global signal used to initialize the entire
system. The router asserts the reconf signal to initialize the
reconfigurable core connected to the local port. The
reconf_n signal in Figure 4 connects to the control signal in
Figure 3, controlling the connection from the router to the
reconfigurable core.
N
E
W
S
Router
Recon-
figurable
Core
Macros
tx
data_in
ack_rx
rx
ack_tx
reconf
reset
reset
tx
data_in
ack_rx
rx
data_out
ack_tx
reset
data_out
rx
data_out
ack_tx
tx
data_in
ack_rx
reset
reconf_n
8
8
8
8
F2R
R2F
F2R
R2F
Figure 4 – Router to reconfigurable core interface.
4. Design Flow for DRS
The layout of reconfigurable IPs shares some properties:
(i) logic of a reconfigurable region must lie inside it
(achieved with placement restrictions); (ii) wires of a
reconfigurable region must lie inside it (achieved with
routing restrictions); (iii) fixed communication interface
with the rest of the DRS. Next Sections details the main
design flow steps to implement a DRS/DSRS.
4.1. Reconfigurable interfaces insertion
To enable the use of reconfigurable IPs, it is necessary
to impose two restrictions in reconfigurable interfaces:
reconfigurable IPs sharing the same region must present
identical interfaces (in terms of number and type of signals)
and identical placement of interface pins. One way to
define reconfigurable interface pins is to insert pre-defined
feedthrough components, named macros. Figure 5(a)
illustrates a system with one fixed IP, two reconfigurable
IPs and macros defining the interface pins. Macros are
inserted in the system description (e.g. VHDL or Verilog).
4.2. Placement constraints
The second step is to constrain the placement of IPs and
macros, as presented in Figure 5(b). A floorplanner tool
may constrain the placement and shape of the system IPs
(fixed and reconfigurable IPs), as well as the placement of
macros. Standard place and route follows the constraints
insertion.
5. 4.3. Routing verification / modification
In the current generation of Xilinx physical synthesis
tools, floorplanning restrictions do not have influence on
the routing tool. As illustrated in Figure 5(b), some wires
can still cross reconfigurable region boundaries. If this
situation occurs, the associated signal can be disconnected
after a reconfiguration step, possibly causing a system
malfunction. This situation pervades all reconfigurable
design flows, including Xilinx Modular Design. In this
case, the designer must either reroute the wire(s) crossing
the interfaces (manually or automatically) or go back to the
previous step, to try different placement constraints. The
final routing must be similar to the one presented in Figure
5(c), where no wire crosses a reconfigurable interface. One
noticeable exception to this rule is the global clock signals,
which can safely cross the whole chip.
Fixed
Reconf
1
Reconf
2
Reconf
1
Reconf
2
C D
Fixed
Reconf
1
Reconf
2
Macro
component
A
Fixed
Reconf
1
Reconf
2
routing
problem
B
Figure 5 – DRS flow proposed in this work.
4.4. Partial configurations generation
Partial configurations, or partial bitstreams, are a set of
bits used to configure a DRS. Partial bitstream generation
is done by extracting a section of a total bitstream,
corresponding to a reconfigurable region. This is illustrated
in Figure 5(d). It is important to include part of the macro
component in partial bitstreams to connect the
reconfigurable core to the fixed part of the DRS. The
method used here to generate partial bitstreams is
straightforward, a one-phase flow. Assignment of another
core to the same region requires partially repeating the flow
for each core, while keeping the same placement
constraints. Two tools may generate partial bitstreams. The
first one is the proprietary Xilinx tool, BitGen, with
specific commands to define the coordinates of the
reconfigurable core. The second tool, compatible with all
Virtex-II (Pro) devices, was developed by the authors.
4.5. Core relocation
Two situations require to partially repeating the DRS
flow. The first one arrives with the assignment of different
cores to the same reconfigurable region. The second one
arrives with the assignment of the same core to different
reconfigurable regions. It is possible to avoid the second
situation if the same bitstream can be loaded at different
regions. This procedure is named relocation [10]. A core
originally synthesized for one reconfigurable region can be
moved to another one, without re-synthesis. Core
relocation also reduces the memory requirements to store
partial bitstreams, diminishing system cost.
5. Case Studies
This Section presents the implementation of two proof-
of-concept DSRS case studies and their comparison. Table
3 details the characteristics of the OPB-based (Figure 6)
and Artemis-based (Figure 7) case studies. These case
studies allow DSRS design space exploration, evaluating
benefits, gains and limitations of each infrastructure
element.
Table 3 - Case studies implementation characteristic
Infrastructure
Element
OPB-based
DSRS
Artemis-based
DSRS
Configuration
Controller
Software (CC-S) Hardware (CC-H)
Reconfigurable
Interface
LUT-Macro LUT-Macro
Repository Internal BRAM External SRAM
Reconfigurable
Port
ICAP + Xilinx
API
ICAP + dedicated
hardware
Communication
Infrastructure
OPB Bus Artemis NoC
5.1. OPB-based DSRS Description
The OPB-based DSRS contains a Microblaze processor,
running an application and the configuration controller
(CC-S). The system also contains several IPs connected to
the OPB bus, as shown in Figure 6.
The design flow to synthesize this DSRS requires
additional steps w.r.t. the one presented in Section 4. A
similar flow is also used in [11]. The steps to build the
OPB-based DSRS are:
• Build an initial system, using the Embedded
Development Kit (EDK) with the Xilinx IPs and the
reconfigurable IP (user function + macros + OPB
wrapper);
• Insert macros to insulate the user function from the
fixed part (Section 4.1). These macros are located
between the IPIF interface and the user function (the
user module template generated by EDK offers to the
user an interface simpler than the OPB bus, named
IPIF). Even if IPIF is simpler than OPB, it has 80
6. signals (36 from left to right, 44 from right to left),
requiring 11 macros (5 R2F macros, 6 F2R macros),
complicating floorplaning and routing steps;
• Generate the system netlist with EDK, exporting it to
ISE (Integrated Software Environment);
• Execute the logic synthesis, followed by floorplanning
(Section 4.2) and physical synthesis (Section 4.3). The
result of this step is the complete bitstream of the SoC;
• Import results back to EDK for software generation.
The binary code is finally added to the complete
bitstream.
Microblaze
Processor
Host
Communication
ICAP
Controller
Macro
Controller
ICAP
ILMB DLMB
O
P
B
B
U
S
Memory
Buffer
Fixed SoC part
Reconfigurable
SoC part
reconf.
control
User
function
MACROS
Reconfigurable IP
OPB
to
IPIF
wrapper
IPIF
IPIF
Figure 6 – The OPB-based DSRS structure.
The above steps are repeated for each reconfigurable IP.
Partial bitstreams (Section 4.4) are extracted from the
obtained complete bitstreams. The OPB-based DSRS was
prototyped in a Memec Insight platform with a Virtex-II
Pro XC2VP30 device.
OPB-based DSRSs have two drawbacks: bus-based
communication and limited internal repository.
Additionally, the design flow is quite complex, since two
software environments are used: EDK and ISE. However,
this simple case study allows reconfiguration time
evaluation using the Xilinx API to access the ICAP
module, and the area consumed to implement the
reconfiguration infrastructure.
5.2. Artemis-based DSRS Description
The Artemis-based DSRS contains a 2x2 NoC used as
communication infrastructure and several IPs as illustrated
in Figure 7. The MR2 processor is a 32-bit RISC processor,
based in a load-store MIPS architecture, with 27 distinct
instructions, a 32x32 register file, non-pipelined. The
processor uses four internal 18 Kbits RAM blocks as
instruction and data memories, providing 1K words in each
memory. Three different arithmetic IP modules can be used
as reconfigurable IPs: “mult” (multiplies two 16-bit
operators), “div” (divides one 16-bit operator by a 16-bit
operator) and “sqrt” (extracts the square root of a 32-bit
operator).
The processor is the system master. Memory mapped
instructions access reconfigurable IPs. The following
system operating protocol is used:
• the processor sends a packet to the CC, informing the
identification of the desired IP.
• the CC (i) receives the reconfiguration request; (ii)
selects a reconfigurable area where to configure the
requested IP (if more than one reconfigurable area is
available); (iii) sends a packet to disconnect
communication between the router and the selected
reconfigurable area; (iv) read the specific bitstream,
transmitting it to ICAP.
• After reconfiguration, the CC sends a packet to
reconnect communication between router and the
configured IP. A second packet is sent to the processor
with the network address where the IP was configured.
01 11
00 10
Configuration
Memory
ICAP
Instruction /
Data Memory
MR2
Processor
Configuration
Controller
Host
Communication
NoC
MACROS
Reconfigurable IP
User
function
Fixed SoC part
Reconfigurable
SoC part
Figure 7 – Artemis-based DSRS.
The Artemis-based DSRS was also prototyped in a
Memec Insight platform with a Virtex-II Pro XC2VP30
device. The design flow used to synthesize this DSRS
employs the straightforward flow presented in Section 4.
This is simpler than the flow used for the OPB-based
DSRS, since only the ISE environment needs to be used.
Except for the configuration controller, this DSRS
follows the recommended choices to implement DSRS.
The configuration controller is implemented in hardware,
favoring performance, but reducing flexibility.
5.3. Infrastructure comparison
A common choice for both experiments presented is the
use of LUT macros. LUT macros were employed in the
OPB-based DSRS due to the number of bits in the
reconfigurable interface (80), therefore reducing the
number of CLB rows when compared to Xilinx Bus
Macros. The LUT macros had to be extended to occupy 4
CLB columns each to achieve successful interface routing.
The Artemis-based DSRS has a less complex interface (21
bits), using four LUT macros, exactly as presented in
Figure 4, and occupying only 2 CLB columns each.
A second common choice in both experiments is the
ICAP configuration port. The first case study uses the
Xilinx API to access the ICAP port, while the second case
study uses a dedicated module developed to access the
ICAP port. As already mentioned, the Xilinx API is slower
than dedicated hardware due to current buffering
requirements. Table 4 compares the partial bitstream sizes
and reconfiguration times.
The third column presents partial bitstream sizes. Partial
bitstreams of the OPB-based DSRS occupy 10 CLB
columns, while for the Artemis-based DSRS they occupy 6
7. CLB columns1
. It is possible to store partial bitstreams of
the OPB-based DSRS in internal BRAMs because a simple
compression algorithm was applied to partial bitstreams,
based on zeroes/ones counting. On-the-fly software
decompression is executed before sending bitstreams to the
ICAP controller. There is no time penalty in this
decompression, due to the algorithm simplicity. The
Artemis-based DSRS stores partial bitstreams in a 1 Mbyte
external SRAM. The Artemis-based DSRS stores up to 10
partial bitstreams, without compression, while the OPB-
based DSRS is able to store only 2 partial bitstreams using
compression.
Table 4 – Reconfiguration times†
for OPB and
Artemis based DSRS case studies.
Partial Bitstream Size Reconf. Time
Case
Study Module Name
Size
(Kbytes)
CC-H CC-S
Minimal
Reconf.
Time
OPB-
based
Arith. 1 / 2 182,180 - 63.55 3.64
Multiply 99,644 9,98 34.76* 1.99
Divider 96,428 9,65 33.63* 1.93
Artemis
-based
Square Root 101,988 10,21 35.57* 2.04
†Times are expressed in milliseconds and reconfigurations run at 50MHz.
*Estimated, using data from the OPB-based system.
The fourth and fifth columns present the reconfiguration
time using the CC-H and CC-S configuration controllers.
The CC-H reconfiguration time is in average three times
faster than CC-S, considering the NoC protocol.
Reconfiguration times were measured using two methods:
internal FPGA timers and a logic analyzer.
The sixth column presents the minimal reconfiguration
time, assuming it would be possible to transmit one partial
bitstream byte per clock cycle (at 50 MHz). This column
shows that it is not possible to work with reconfiguration
times below 1 ms in current case studies, with
reconfigurable IPs using 6 to 10 CLB columns. With more
complex reconfigurable IPs, reconfigurable area is
expected to increase consequently increasing the
reconfiguration time.
Figure 8 details the reconfiguration time for the divider
IP. The reconfiguration time, 9.65 ms, is equivalent to
482,500 clock cycles. Observe that 99.94 % of this
reconfiguration time is spent by the reconfiguration process
itself (Figure 8(c)), with a very small time spent in the NoC
with control packets.
After reconfiguration, the protocol to access the
reconfigurable IP comprises three steps: (i) creation and
transmission of a packet with the operators to the
reconfigurable IP; (ii) creation and transmission of a read
packet to receive results; (iii) reception of the result packet
from the reconfigurable IP. Typical time spent in each step
is 173, 141 and 117 clock cycles respectively. As the
reconfigurable IPs are very simple in this case study, once
1
Different bitstream sizes for the same number of CLB columns
exists because partial bitstreams are generated by Bitgen, which
uses the multi-frame write feature.
the read request arrives at the reconfigurable IP, the packet
with the results is sent immediately to the source IP,
totalizing in average 439 clock cycles (sum of the time
spent in each step). This protocol can be simplified by
eliminating the read packet (141 cycles), sending the
answer from the reconfigurable IP directly to the source IP.
(a) packet from a source IP to the CC asking a new reconfigurable IP
(b) CC processing time and packet to the reconfigurable area to disconnect it
(c) reconfiguration time
(d) packet from the CC to the new reconfigurable IP reconnecting it
(e) packet from the CC to the source IP with the reconfigurable IP address
145 4 482,221
(a) (c)
(b) (e)
(d)
4 126
Figure 8 – Reconfiguration protocol timing, in clock
cycles, for Artemis-based DSRS.
At 50 MHz, 10 ms represent 500,000 clock cycles. This
reconfiguration time can be hidden by: (i) executing
complex computations in hardware; (ii) pre-fetching
reconfigurable IPs to later use; (iii) reusing the same
reconfigurable IPs during a time longer than the execution
in software plus the time to configure the IP into the DSRS.
With such strategies, the reconfiguration time has minimal
impact in DSRS performance. For example, if a given
function executed in hardware is 500 clock cycles faster
than an equivalent software implementation, after 1,000
consecutive executions the hardware implementation
displays superior performance. This can be easily achieved
with image processing algorithms, where the same
operation is repeated thousands of times.
For these proof-of-concept case studies, the average
execution time for the equivalent software implementation
is 26% slower (in average 600 clock cycles against 439
clock cycles). This difference in favor of the hardware
implementation, 161 cycles, is not yet sufficient to
demonstrate performance gains for the proposed
infrastructure, but clearly shows its viability. Some
application portions (typically loops) may benefit from this
approach, given they consume at least 1,000 clock cycles in
the embedded processor and are repeatedly used.
Table 5 and Table 6 compare the area to implement both
DSRSs. The first analysis concerns the configuration
controller (CC) area overhead. The CC-H uses 494 slices.
The CC-S uses 821 slices (Microblaze, ICAP and macro
controllers). However, if a processor is already available in
the system (such as MicroBlaze), the area of the CC-S
represents the area of the ICAP and macro controllers,
resulting in 250 slices. As processors are ubiquitous in
actual SoCs, a software CC represents the implementation
option with smaller area overhead.
The area of the Artemis-NoC is 1167 slices (Table 5),
representing in average 290 slices per router. For this case
study, this area represents an important overhead. In
practice, when using real IPs, an area overhead of 5-10%
per IP is expected, justifying the use of NoCs in DSRSs.
Comparing the router area to the Gecko platform [5],
Gecko routers consume 611 slices (router plus network
interfaces, data and control).
8. Table 5 - Artemis-based DSRS area report (XC2VP30)
# Slices (total: 13696) # FF (total: 27392)
IP
Total Percentage Total Percentage
Serial 316 2.31% 279 1.02%
Processor 1001 7.31% 555 2.03%
CC (CC) 494 3.61% 294 1.07%
Artemis NoC 1167 8.52% 959 3.50%
DIV (reconf IP) 183 1.34% 259 0.95%
MULT (reconf IP) 172 1.26% 259 0.95%
SQRT (reconf IP) 223 1.63% 269 0.98%
Table 6 - OPB-based DSRS area report (XC2VP30).
# Slices (total: 13696) # FF (total: 27392)
IP
Total Percentage Total Percentage
MicroBlaze 571 4.17 366 1.34
MicroBlaze Perip. 160 1.17 75 0.27
MicroBlaze OPB 90 0.66 11 0.04
ICAP Controller 151 1.10 155 0.57
Macro Controller 99 0.72 136 0.50
Arith1 (reconf IP) 128 0.93 168 0.61
Arith2 (reconf IP) 128 0.93 168 0.61
6. Conclusion and Future Work
The main contribution of this work is the proposal of a
conceptual DSRS architecture, summarized in Table 2,
centered on the use of a NoC interconnection. The
implementation of two proof-of-concept case studies
demonstrates the viability of the proposed DSRS
architecture, even if none of the case studies follow all
recommendations. However, each recommendation in the
Table was implemented and evaluated by at least one of the
case studies. To support the development of the proposed
DSRS architecture, the paper advanced two additional
contributions: (i) a suggestion of a straightforward DSRS
design flow; (ii) the design of a specific NoC supporting
partial and dynamic hardware reconfiguration.
The ideal implementation choice for this DSRS
architecture is dedicated ASICs with embedded
reconfigurable areas. Nonetheless, partial and dynamic
reconfigurable FPGAs were used to successfully prototype
the architecture. The main advantage of the suggested flow
is a reduced number of steps compared to other flows
proposed in the literature, such as Modular Design. The
proposed flow employs new macros, which guarantee the
correct operation of the rest of the system during
reconfiguration, avoiding the use of tristate buffers,
components scarcely available in Virtex FPGAs. Also, the
new macros enable the use of communication architectures
other than busses to link reconfigurable modules to other
parts of the system. To support dynamic IPs
reconfiguration, the paper showed the need to add services
to ordinary NoCs. Three needed services were identified:
IP insulation, packets discarding and IP reconnection.
These services were implemented over the existing Hermes
NoC, resulting in the Artemis NoC, which supports DSRS.
The case studies evaluation helped to identify the area
overhead incurred by the proposed infrastructure and the
reconfiguration time. The addition of a Configuration
Controller in a SoC represents a small area overhead (1.82
to 3.61% of the available slices for XC2VP30 device),
while providing a greater flexibility to the system. The
addition of hardware flexibility to a SoC enables to
implement the same function both in software and in
hardware. The user or the operating system may select the
implementation according to performance requirements.
The experiments allowed to observe that, independently of
the fact that reconfiguration is controlled in software or
hardware, IP reconfiguration time is always above 2 ms for
current FPGA technologies (measured times were between
9.65 ms and 63.55 ms). This represents an average value of
500,000 clock cycles. The time measured to send data to
the reconfigurable IP, and to receive data from it, through
the NoC is around 439 clock cycles. Performance gains can
be easily obtained in loops with small/medium complexity
(1,000 clock cycles) or more complex IPs.
7. References
[1] Keutzer, K.; Newton, A.R.; Rabaey, J.M.; Sangiovanni-
Vincentelli, A. “System-Level Design: Orthogonalization of
Concerns and Platform-Based Design”. IEEE Transactions
on CAD of Integrated Circuits and Systems, vol. 19 (12),
Dec. 2000, pp. 1523-1543.
[2] Van den Branden, G.; Touhafi, A.; Dirkx, E. “A design
methodology to generate dynamically self-reconfigurable
SoCs for Virtex-II FPGAs”. In: FPT’05, 2005, pp. 325-326.
[3] Dally, W.; Towles, B. “Route Packets, Not Wires: On-Chip
Interconnection Networks”. In: DAC’01, 2001, pp. 684-689.
[4] Benini, L.; De Micheli, G. “Networks on Chips: a New SoC
Paradigm”. Computer, vol. 35 (1), Jan. 2002, pp. 70-78.
[5] Marescaux, T.; Nollet, V.; Mignolet, J.-Y.; Bartic, A.;
Moffat, W.; Avasare, P.; Coene, P.; Verkest, D.; Vernalde,
S.; Lauwereins, R. “Run-Time Support for Heterogeneous
Multitasking on Reconfigurable SoCs”. Integration, the
VLSI Journal, vol. 38 (1), Oct. 2004, pp. 107-130.
[6] Lim, D.; Peattie, M. “Two Flows for Partial
Reconfiguration: Module Based or Small Bit
Manipulations”. Xilinx Application Note 290 (v1.0), 2002.
[7] Palma, J.; Mello, A.; Möller, L.; Moraes, F.; Calazans, N.
“Core Communication Interface for FPGAs”. In: SBCCI’02,
2002, pp. 183-188.
[8] Huebner, M.; Paulsson, K.; Becker, J. “Parallel and Flexible
Multiprocessor System-On-Chip for Adaptive Automotive
Applications based on Xilinx MicroBlaze Soft-Cores”. In:
IPDPS’05, 2005, pp. 149a-149a.
[9] Moraes, F.; Calazans, N.; Mello, A.; Möller; Ost, L.
“HERMES: an Infrastructure for Low Area Overhead
Packet-switching Networks on Chip”. Integration, the VLSI
Journal, vol. 38 (1), Oct. 2004, pp. 69-93.
[10] Krasteva, Y.; Jimeno, A.; Torre, E.; Riesgo, T. “Straight
Method for Reallocation of Complex Cores by Dynamic
Reconfiguration in FPGAs”. In: RSP’05, 2005, pp. 77-83.
[11] Donato, A.; Ferrandi, F.; Santambrogio, M.D.; Sciuto, D.
“Caronte: a complete methodology for the implementation
of partially dynamically self-reconfiguring systems on
FPGA platforms”. In: FCCM’05, 2005, pp. 321-322.