1. FAULT TOLERANCE IN FPGA
BASED SYSTEMS
CSE661-Milestone 3
Karisma Ramesh
451126715
CSE661 Milestone-3 1
2. What is an FPGA?
• A field-programmable gate array (FPGA) is an integrated circuit designed to be configured by a
customer or a designer after manufacturing – hence "field-programmable". The FPGA
configuration is generally specified using a Hardware Description Language (HDL), similar to that
used for an Application Specific Integreted Circuits (ASIC).
CSE661 Milestone-3 2
3. FPGA ARCHITECTURE
CSE661 Milestone-3 3
In general, FPGAs consist of regular arrays of programmable logic blocks (PLBs) connected to each other by a
programmable routing matrix. An FPGA configuration defines the functionality of an FPGA, specifying which
logic blocks are used and which wire segments are used to connect them, as well as what functionality each
block provides. As in Figure
4. What is Fault Tolerance?
• A fault can be defined as a physical occurrence within an FPGA that causes it to malfunction, such as a
broken wire caused during manufacture by a dust particle. Faults usually occur at the beginning and end of a
chip’s life cycle. Fabrication faults, or defects, are usually caused by contaminants or other flaws in the
manufacturing process and are detected during manufacture testing. Late life faults are usually due to failure
of device resources.
CSE661 Milestone-3 4
System failure rate during the life cycle of an FPGA.
5. Methods Of Fault Detection
• 1. Redundant/concurrent error detection uses additional logic as a
means of detecting when a logic function is not generating the
correct output.
• 2. Off-line test methods cover any testing which is carried out when
the FPGA is not performing its operational function.
• 3. Roving test methods perform a progressive scan of the FPGA
structure by swapping blocks of functionality with a block carrying out
a test function.
CSE661 Milestone-3 5
6. Methods Of Fault Detection
There are various methods used to implement fault tolerance in FPGA systems some of them are listed in this
paper:
• Single Fault Tolerance: These are mostly transient faults caused by extraordinary circumstances in the
environment. Examples of these are charged particles striking the FPGA while it is in space, or radioactive
materials sending out energy which lodges inside the FPGAs vulnerable systems. Because SEU faults are so
common a plethora of methods have been devised to mitigate them.
• Multiple Fault Tolerance: While single faults account for many of the problems that FPGA systems encounter
there are some environments or some applications where multiple faults can happen simultaneously. Some
of these situations are due to the fact that feature size continues to decrease. This by itself can cause many
faults in a manufactured device.
• Hardware level fault tolerance: Hardware level repair performs a correction such that the FPGA remains
unchanged for the purposes of the configuration. The device retains its original number and arrangement of
useable logic clusters and interconnects.
CSE661 Milestone-3 6
7. • Configuration level Fault Tolerance: is achieved using resources that are unused by the design. The spare
resources can replace faulty ones in the event of a fault.
• System level Fault Tolerance: repair works at a higher level. When a design is highly modular, a fault can be
tolerated by the use of a spare functional block or by providing degraded performance . Such methods are
not considered in more detail here, as they are not limited in application to FPGAs.
CSE661 Milestone-3 7
8. Open source code
VHDL Code for GeneratePropogate block correction in 8-bit Kogge-Stone Fault Correcting Adder
• library IEEE;
• use IEEE.STD_LOGIC_1164.ALL;
• use IEEE.STD_LOGIC_ARITH.ALL;
• use IEEE.STD_LOGIC_UNSIGNED.ALL;
• entity GPblock is
• port( a,b: in std_logic;
• g,p: out std_logic);
• end GPblock;
• architecture Behavioral of GPblock is
• begin
• g <= a and b;
• p <= a xor b;
• Behavioral;
CSE661 Milestone-3 8
9. VHDL Code for mux correction in 8-bit Kogge-Stone Fault Correcting Adder
• library IEEE;
• use IEEE.STD_LOGIC_1164.ALL;
• entity mux is
• port (x,y: in std_logic_vector(1 downto 0);
• z : out std_logic_vector(1 downto 0);
• sel: in std_logic);
• end mux;
• architecture Behavioral of mux is
• constant delay: time :=100ns;
• begin
• mux_proc : process(x,y,sel)
• variable temp : std_logic_vector(1 downto 0);
• begin
• case sel is
• when '0'=> temp:=x;
• when '1'=> temp:=y;
• when others => temp :="XX";
• end case;
• z<= temp;
• end process mux_proc;
• end Behavioral;
CSE661 Milestone-3 9
10. VHDL Code for sum correction in 8-bit Kogge-Stone Fault Correcting Adder
• library IEEE;
• use IEEE.STD_LOGIC_1164.ALL;
• use IEEE.STD_LOGIC_ARITH.ALL;
• use IEEE.STD_LOGIC_UNSIGNED.ALL;
• entity sum is
• port( p,c: in std_logic;
• s : out std_logic);
• end sum;
• architecture Behavioral of sum is
• begin
• s <= (p xor c);
• end Behavioral;
CSE661 Milestone-3 10
11. Conclusions
• FPGAs are a very important computing resource for many different fields in the world today. Their
reconfigurability allows for incredible flexibility and reuse. But this benefit comes with a cost.
• Clearly, no single FT methodology is significantly better than the others, The best general solution to FPGA FT
is probably a combination of both DL and CL fault tolerance methodologies. The most likely future
advancement in fault tolerance will be in the area of self-adaptation in the presence of faults. This will allow
FPGAs to be fault tolerant no matter the environment. As for detection and diagnosis the focus is always on
improving speed, coverage and overhead.
CSE661 Milestone-3 11
12. References
• 1. Jason. A. Cheatham , John M. Emmert and Stan Baumgart 2006. A Survey of Fault Tolerant
• Methodologies for FPGAs
• 2. Jano Gebelein, Heiko Engel and Udo Kebschull 2010. FPGA fault tolerance in radiation
• susceptible environments
• 3. Khaled Elshafey, Jan Hlavicka ˇ 2002 . FAULT-TOLERANT FPGA-BASED SYSTEMS
• 4. Daniel Fisher, Addison Floyd . Survey of Detection, Diagnosis, and Fault Tolerance Methods in FPGAs
• 5. Wei-Je Huang and Edward J. McCluskey . Column-Based Precompiled Configuration Techniques for FPGA Fault Tolerance
• 6. Edward Stott, Pete Sedcole, Peter Y. K. Cheung 2007. FAULT TOLERANT METHODS FOR
• RELIABILITY IN FPGAs BasicsFPGA
:https://www.google.com/search?biw=1536&bih=758&q=fpga+basics&revid=1721831489&sa=X&ei=LEw0VPK7LIuxyAT6loLICg&ved=0CG8Q1QIoBQ
• 7. Technical Blogs:
• http://www.pe-ip.com/ by Marc Perron http://billauer.co.il/blog/category/fpga/
• by Eli Billauerhttp://fpgablog.com/
• 8. http://www.dtic.mil/dtic/tr/fulltext/u2/a462520.pdf
• 9. http://www.ijetch.org/papers/424-C049.pdf
• 10.http://www.academia.edu/5178678/High_Speed_Fault_Injection_Tool_Implemented_With_Verilog_HDL_on_FPGA_for_Testing_Fault_Tolerance_Designs
• Others:
• Google
• Wikipedia
• A text book for Hardware description language Verilog and VHDL by A K pedroni
CSE661 Milestone-3 12