Generic Partial Dynamic Reconfiguration Controller   for Fault Tolerant Designs Based on FPGA                              ...
simple structure of methodology for FPGA-based FT system                                          the acknowledgment of st...
From the binary index, the LUU derives starting and ending              safety window. The smallest implementation of the ...
ML506-Virtex5                                       Size      # LUTs     # F/Fs       TMR     100 PRMs                    ...
Upcoming SlideShare
Loading in...5



Published on

  • Be the first to comment

  • Be the first to like this

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide


  1. 1. Generic Partial Dynamic Reconfiguration Controller for Fault Tolerant Designs Based on FPGA Martin Straka, Jan Kastil, Zdenek Kotasek Brno University of Technology Faculty of Information Technology Bozetechova 2, Brno, 612 66, Czech Republic {strakam, ikastil, kotasek} Abstract— In recent years, many techniques for self repairing during the PDR process, such as bitstream decompression orof the systems implemented in FPGA were developed and they can even change bitstream structure [9] [10].presented. The basic problem of these approaches is bigger In [11], the FT scheme for Xilinx FPGA Virtex4 is pre-overhead of unit for controlling of the partial reconfigurationprocess. Moreover, these solutions generally are not implemented sented. The scheme consists of two parts: the Partially Re-as fault tolerant system. In this paper, a small and flexible configurable Functional Region (PRFR) with several Partialgeneric partial dynamic reconfiguration controller implemented Reconfigurable Module (PRM) and reconfiguration controllerinside FPGA is presented. The basic architecture and usage of which is based on built-in Xilinx PowerPC-405 processor.the controller in the FPGA-based fault tolerant structure aredescribed. The implementation of controller as fault tolerant II. M OTIVATION AND G OALS OF THE R ESEARCHcomponent is described as well. The basic features and synthesisresults of controller for Xilinx FPGA and comparison with Today, many techniques for self repairing of systems imple-MicroBlaze solution are presented. mented in SRAM-based FPGA were developed and presented. The basic problem of these approaches is a very large overhead I. I NTRODUCTION of units controlling the reconfiguration process control. For this purpose, the embedded processors as PowerPC or MicroB- Digital system can be implemented on various platforms. laze can be used. The processor controlling the reconfigurationField Programable Gate Array (FPGA), like Xilinx SRAM- process is a critical part of the system. If the microprocessorbased FPGA family, is typical example of such platform. Many fails, then the whole system will also fail. Therefore, it is veryof the FPGA based designs are constructed as Fault Tolerant important to ensure that the software in microprocessors will(FT) systems with the possibility of recovering from errors by always work correctly. This can be assured only by the formalmeans of reconfiguration procedures [1]. Different FT archi- verification of all software tools of processors. Moreover, thetectures are known to improve reliability in digital systems, SEU can impact the functionality of the processor itself. MostTriple Modular Redundancy (TMR) and duplex systems can of the modern microprocessors are not built as SEU tolerantserve as examples [2]. and therefore they fail if SEU changes the contents of their The design of dependable FT systems in SRAM-based registers or memories. MicroBlaze is even more susceptible toFPGA include three problems: error detection during system faults caused by SEU, since its own structure can be changedoperation, fast fault location and fast repair by means of the as well. To build effective partial dynamic reconfigurationchip reconfiguration [3]. When errors are detected in any controller based on a microprocessor, the microprocessorpart of the system implemented into FPGA then there exist itself must be implemented as an FT component which willa possibility to extend its lifetime [4].For this purpose, the cause an additional overhead. Fortunately, only a very limitedPartial Dynamic Reconfiguration (PDR) of FPGA circuit can functionality is required for the PRC itself. Therefore it isbe used [5]. possible to build PDR controller in the FPGA fabric. In a SRAM-based FPGA, the combinational and sequentiallogic is implemented in programmable complex logic blocks, A. Previous Workwhich are customized by loading configuration data in the The activities which aim at defining a methodology for FTSRAM cells of the program memory [6]. When an error systems design into FPGA platforms were presented in [12].appears in a memory cell in the program memory (possibly The main principles of PDR were described together withbeing striked by a charged particle), the effect can produce the FT architectures based on the PRMs in [13]. Severalan inversion in the stored value – this can modify the design architectures use online checkers or other Concurrent Errorfunctionality and is called Single Event Upset (SEU) [7]. Detection (CED) techniques for error detection. These error Modern FT architectures using PDR often utilize micropro- detection techniques were described in [12]. If an error iscessors embedded into FPGA such as PowerPC or MicroB- detected by checker in unit, the Partial Reconfiguration Con-laze [8]. These microprocessors can perform complex tasks troller (PRC) initiates reconfiguration process of this unit. The978-1-4244-8971-8/10$26.00 c 2010 IEEE
  2. 2. simple structure of methodology for FPGA-based FT system the acknowledgment of storage controller that data sent fromdesign with PRC inside of FPGA can be seen in Figure 1. memory are correct. The output signals of GPDRC indicate hard error occurrencein out in any PRM and its index. Sync output signal provides FT architecture 1 FT architecture 2 ... PRM PRM PRM FT architecture n external synchronization of PRM in FT architecture after the PRM PRM PRM PRM errors reconfiguration process. Bus addr bitstr contains address to the bus memory containing the required byte of bitstream. DYNAMIC PART Partial Partial ICAP Reconfiguration Reconfiguration validin Others Controller Controller unit FPGA #Err-PRMs Hard STATIC PART Generic n Partial Bitstreams storage bitstream PRM index out Dynamic 32 Reconfiguration log(n) Controller addr_bitstr Fig. 1. The structure of FT designs in FPGA based on PRMs rst (GPDRC) x clk This paper presents a new approach for driving PDR syncprocess by simple Generic Partial Reconfiguration Controller(GPDRC) implemented into FPGA circuit. The goal of our Fig. 2. Interface of GPDRC for FT system implemented in SRAM-basedresearch can be defined in the following way: we want to im- FPGA as PRMsplement GPDRC as small FT unit and verify its functionalityin FT structure. In Figure 3, detailed architecture of GPDRC developed by The paper is organized as follows. First, the architecture and our team is shown. The architecture of GPDRC consists ofbasic features of GPDRC and its role in FT structure - for the five units, one FIFO memory and one FSM which drive eachdetection of faulty PRM and the initiation of PRD process of unit.the faulty module are described (section III). The experimentsand results with the GPDRC and its properties are presented #Err-PRMs GPDRCin section IV. Conclusions and ideas for our future research ...are summarized in section V. PRMs Error Register File ... III. G ENERIC PARTIAL DYNAMIC R ECONFIGURATION Round Robin Hard Generic Hard C ONTROLLER Generic Error Error Detection Encoder Unit PRM Unit This work assumes that, FT structure for FPGA-based FT indexsystem design consists of three parts (see Figure 1). Staticpart, dynamic part and reconfiguration controller. Components sync Safety Windowwhich are not designed as FT are included in the static part. Address FSM Unit Look UpStatic part should not be reconfigured. The dynamic part Unit ICAPconsists of FT architectures which can be reconfigured by interfacemeans of PDR and therefore must be divided into PRMs with ECC wrapper = Uniterror output signals. The last part of FPGA contains GPDRC. Addr CounterThe presented FT structure supports detection and correction FIFOof all faults caused by SEUs and hard errors detection (damagein the chip can not be repaired) in PRM. It is important to addr_bitstr valid bitstreamnote that internal configuration access point (ICAP) is used Partial Bitstreams Strorage for PRMs - (FLASH,ROM)primarily but the system can be used with other configurationinterfaces. Moreover, ICAP is accessible from the FPGAs logicand therefore can operate much faster than other configuration Fig. 3. Architecture of GPDRC for FT system implemented in SRAM-basedinterfaces. If the full throughput of ICAP is used, the smallest FPGA as PRMsPRM can be loaded into the configuration memory in the 28µs. If more than one SEU occurs (more error signals are active),A. Architecture and Features of GPDRC round robin algorithm chooses one of the PRMs which should The interface of GPDRC can be seen in Figure 2. The be reconfigured. Generic Error Encoder (GEE) decodes binaryinput signals consist of three control signals and two data index of this PRM and sends identification number togethersignals. Error signals from all PRMs in FT structure are with error identification signal to Look Up Unit (LUU) andconnected to GPDRC inputs. Bitstream input signal loads Hard Error Unit (HEU). HEU detects hard error in PRMs afterframes of configuration data from external memory (flash or reconfiguration process. If error still exists in PRM, it is seenROM) or any other reliable source. Valid signal represents as hard error.
  3. 3. From the binary index, the LUU derives starting and ending safety window. The smallest implementation of the safetyaddress of the corresponding partial bitstream of faulty PRM window is to implement one safety window in the GPDRC -and LUU starts the readback of bitstream from external mem- SWU in Figure 3. In that case, the size of the window has to beory to FIFO. LUU contains address unit which is implemented big enough for all units in the system and therefore units withas a single counter with simple FSM. When the partial fast synchronization will not be able to repair faster than otherbitstream is loaded from memory into FIFO and if parity of unit. Another extreme would be to implement safety windowevery frame of bitstream is correct (ECC unit) then FSM starts in every PRM to guarantee that the next repair process couldthe reconfiguration process through ICAP. Safety Window Unit be performed as soon as possible.(SWU) ensures that all PRMs are synchronised before a newround of reconfiguration process can start. C. GPDRC Implemented as Fault Tolerant System After successful reconfiguration, the system will continue on GPDRC itself can be implemented as fault tolerant. In thisthe next error until all of them are repaired. As long as every case, fault tolerant parts of the GPDRC must be implementedmodule operation is attacked by one SEU, it is guaranteed that into the dynamic part of FPGA and must be divided into PRMsthe whole system operates correctly. If some of errors can not with error output signals in the same fashion as any otherbe repaired by the PDR, round robin arbiter will assure that FU. FT parts of GPDRC are implemented as TMR systemsuch errors will not block the repair process of the remaining or duplex system with comparators. If error is detected inPRMs. any PRM of GPDRC than this unit has to be reconfiguredB. Synchronization Problem and Safety Window first. Therefore bitstreams for the GPDRC is located at the beginning of the bitstream storage and the error signal will The PDR is able to repair a fault that caused error in a reset the address counter and start the reconfiguration process,PRM but the state of the module after reconfiguration process which will effectively reconfigure part of GPDRC withoutis undefined. There are two methods for setting the internal interrupting functionality of the device.state of the unit to correct value. The first method copies theactual state from the other implementation of the unit in the IV. E XPERIMENTS AND R ESULTSsystem. This method is relatively complex but can be used inany type of the system. The second method synchronizes the The architecture of GPDRC described above was imple-units before receiving work for the next packet by generating mented for SRAM-based FPGA FT design in VHDL language.the local reset signal at the end of the actual packet. This For the synthesis XILINX ISE 11.4 was used. ISE 11.4method has limitations and can be used only in systems with supports PDR in Virtex5 which is available on ML506 de-the packet processing. The second method should be preferred velopment board. Correct function of error detecting circuitrywhenever it is possible because its implementation is simple was tested on this board in the early phase of this work.and cheap. For experimenting with methodology and GPDRC, a simple digital circuit was developed which contains several counters, Data decoders, multiplexors and other additional logic. These com-Loc_reset ponents of the circuit were implemented as FT architectures divided into PRMs by our methodology. We compared basic Error Safety window features and properties of the GPDRC. The following features Reconf were considered: Sync • the sizes of functional units of GPDRC (numbers ofFig. 4. Timing diagram with synchronization of PRMs after reconfiguration LUTs, FlipFlop registers) for 100 PRMs • the size and overheads of GPDRC implemented as FT The synchronization of the repaired unit is not done in- systemstantly. If there is only one fail in the system, it is possible • the comparison of GPDRC size (in slices) with the sizethat the reconfiguration process tries to repair already repaired of MicroBlaze solutionunit, which is waiting for the synchronization. The unit will not • the parameters of GPDRC (size in slices, frequency ofwork correctly after second reconfiguration because it has to design) for different numbers of PRMswait for synchronisation. The HEU, however, would evaluate • the probability that GPDRC fails if SEU occurs in thethe error as unrepairable since it was not repaired by two architecturereconfigurations. To solve this problem, safety window is in- The results of GPDRC and MicroBlaze synthesis intotroduced into the reconfiguration system (see Figure 4). Safety Virtex5 XC5VSX50T and the numbers of resources can bewindow is the minimal time required between reconfiguration seen in Table 1. The meaning of the columns is as follows:of the same unit. The length of the safety window depends on column 1 - the name of component in GPDRC architecture,the implemented system and on the synchronization method. column 2 - the size of component and the utilization of FPGAThere is tradeoff between implementation complexity and a resources, column 3 (4) - the numbers of LUTs (FlipFlops),speed of the reconfiguration in the implementation of the column 5 - the size of FT GPDRC and overhead.
  4. 4. ML506-Virtex5 Size # LUTs # F/Fs TMR 100 PRMs [slices] [-] [-] [slices] the relation between GPDRC and its dependability parameters. Round Robin Unit 91 (1,1%) 101 101 214 (2,4x) Acknowledgments Error Encoder 74 (1,0%) 110 0 165 (2,2x) Hard Error Unit 38 (0,6%) 36 103 113 (2,9x) This work was supported by the Grant Agency of the Czech Safety Window Unit 11 (0,2%) 31 25 98 (9,0x) Republic (GACR) No.102/09/1668 - ”SoC circuits reliability ECC Unit 7 (0,1%) 8 1 16 (2,3x) Address Unit 60 (0,8%) 138 21 108 (1,8x) and availability improvement” and by GACR No.102/09/H042 FSM 13 (0,2%) 28 13 29 (2,2x) - ”Mathematical and Engineering Approaches to Developing GPDRC total 324 (4,0%) 509 262 858 (2,6x) Reliable and Secure Concurrent and Distributed Computer MicroBlaze IP core 613 (7,5%) 1333 1328 1531 (2,5x) Systems” and by Research Project No. MSM 0021630528 - TABLE I ”Security-Oriented Research in Information Technology” and N UMBERS OF FPGA RESOURCES FOR GPDRC the grant ”BUT FIT-S-10-1”. R EFERENCES [1] K. Kyriakoulakos and D. Pnevmatikatos, “A novel sram-based fpga architecture for efficient tmr fault tolerance support,” in International The probability that GPDRC fails if SEU occurs in architec- Conference on Field Programmable Logic and Applications, 2009. FPLture is 3.97%, for MicroBlaze IP core it is 7.52%. In Virtex5- 2009. Washington, DC, USA: IEEE Computer Society, 2009, pp. 193– 198.XC5VSX50T FPGA 204 PRMs can be created. Figure 5 shows [2] S.-Y. Yu and E. J. McCluskey, “On-line testing and recovery in tmrhow the number of slices increases with the number of PRMs. systems for real-time applications,” in ITC ’01: Proceedings of the 2001The size of GPDRC increases almost linearly with the number IEEE International Test Conference. Washington, DC, USA: IEEE Computer Society, 2001, p. 240.of PRMs and frequency of GPDRC balances about 230 MHz. [3] F. G. de Lima Kastensmidt, G. Neuberger, R. F. Hentschke, L. Carro,The frequency of GPDRC is sufficient because the speed of and R. Reis, “Designing fault-tolerant techniques for sram-based fpgas,”ICAP interface is 100 MHz. IEEE Des. Test, vol. 21, no. 6, pp. 552–562, 2004. [4] A. Jacobs, A. George, and G. Cieslewski, “Reconfigurable fault tol- erance: A framework for environmentally adaptive fault mitigation in 800 800 space,” in International Conference on Field Programmable Logic and Size o GPDRC Frequency after synthesis Applications, 2009. FPL 2009. Los Alamitos, CA, USA: IEEE 700 Linear approximation of size 700 Computer Society Press, 2009, pp. 199–204. Frequency after Synthesis [MHz] Linear approximation of frequency [5] J. Heiner, B. Sellers, M. Wirthlin, and J. Kalb, “Fpga partial reconfig- Number of occupied slices 600 600 uration via configuration scrubbing,” in Field Programmable Logic and Applications, 2009. FPL 2009. International Conference on, aug. 2009, 500 500 pp. 99 –104. [6] R. Oliveira, A. Jagirdar, and T. J. Chakraborty, “A tmr scheme for seu 400 400 mitigation in scan flip-flops,” in ISQED ’07: Proceedings of the 8th International Symposium on Quality Electronic Design. Washington, 300 300 DC, USA: IEEE Computer Society, 2007, pp. 905–910. [7] C. Bolchini, A. Miele, and M. D. Santambrogio, “Tmr and partial 200 200 dynamic reconfiguration to mitigate seu faults in fpgas,” in DFT ’07: Proceedings of the 22nd IEEE International Symposium on Defect 100 100 and Fault-Tolerance in VLSI Systems. Washington, DC, USA: IEEE Computer Society, 2007, pp. 87–95. 0 0 [8] L. Sterpone, M. Aguirre, J. Tombs, and H. Guzm´ n-Miranda, “On the a 0 20 40 60 80 100 120 140 160 180 200 design of tunable fault tolerant circuits on sram-based fpgas for safety Number of PRMs critical applications,” in DATE ’08: Proceedings of the conference on Design, automation and test in Europe. New York, NY, USA: ACM, Fig. 5. Size and frequency of GPDRC based on #PRMs 2008, pp. 336–341. [9] D. Fay, S. Campbell, G. Miller, and D. Connors, “Teaching fault tolerant fpga design for aerospace applications,” International Symposium on Microelectronics Systems Education, IEEE International Conference V. C ONCLUSIONS AND F UTURE W ORK on/Multimedia Software Engineering, vol. 0, pp. 61–62, 2007. In the paper, the basic principle of the methodology for [10] J. Torresen, G. Senland, and K. Glette, “Partial reconfiguration applied in an on-line evolvable pattern recognition system,” in NORCHIP 2008.FT system design applying PDR was presented. Our previous Washington, DC, USA: IEEE Computer Society, 2008, pp. 61–64.experiments have shown that PDR can be used for the design [11] X. Iturbe, M. Azkarate, I. Martinez, J. Perez, and A. Astarloa, “A novelof FT architectures in SRAM-based FPGAs. The main role of seu, mbu and she handling strategy for xilinx virtex-4 fpgas,” in Inter- national Conference on Field Programmable Logic and Applications,PDR controller in FT system is seen in the identification of 2009. FPL 2009. Washington, DC, USA: IEEE Computer Society,faulty PRM and the fast initiation of reconfiguration process 2009, pp. 569–573.of the faulty module in FT architectures. The main structure [12] M. Straka, J. Kastil, and Z. Kotasek, “Fault tolerant structure for sram- based fpga via partial dynamic reconfiguration,” in 13th EUROMICROand basic parameters of GPDRC were gained and described Conference on Digital System Design DSD 2010. Washington, DC,together with the problems of PRMs synchronization after USA: IEEE Computer Society, 2010.reconfiguration of one of them. The results of synthesis [13] M. Straka, J. Kastil, and Z. Kotasek, “Modern fault tolerant architectures based on partial dynamic reconfiguration in fpgas,” in 13th IEEE Inter-demonstrate that GPDRC has lower overhead than controller national Symposium on Design and Diagnostics of Electronic Circuitsimplemented as MicroBlaze. and Systems. New York, NY, USA: IEEE Computer Society, 2010, pp. Future research shall concentrate on more effective imple- 336–341.mentation of PRMs synchronization in FT architectures and