1. i
Adama Science and Technology University
School of Electrical Engineering and Computing
Department of Computer Science Engineering
Network Architecture Assignment 2
Title System Software Support for router Fault tolerance
Prepared by Ashenafi Workie
Id No PGR/18068/11
Submitted to Dr. N. Satheesh Kumar, Network
Science SIG Leader
Submission date January 27, 2019
2. ii
Contents
Abstract.............................................................................................................................................1
1. Introduction...............................................................................................................................1
Basic router functionality and architecture ...................................................................................2
2 Literature Review........................................................................................................................2
3 Research Question......................................................................................................................4
4 Objective....................................................................................................................................4
5 Methodology..............................................................................................................................4
Classification of network typical faults are: ...................................................................................4
Generalized Algorithm of Fault Tolerance (GAFT)........................................................................4
6 Conclusion and Future Works......................................................................................................6
Reference..........................................................................................................................................8
3. 1
System Software Support for Router Fault Tolerance
Ashenafi W. Dessalgn
Adama Science and Technology University
Department of computer Science and Engineering, Email: {ashenafiworkie@gmail.com}
Abstract
Communication Network become shifted from dedicated physical network devices to the high level
of functional software component of those network devices. Fault tolerant network has an ability
to find or detect any irregular situation that might resulted in temporarily malfunctions or
permanent faults. Depends on topologies and protocols used an actual faults does occur multiple
alarm alerts could be generated through multiple network elements. Modern network management
helps to determine the root cause of faults regardless of its type which may occur
and perform fault detection. Routers are the most important traffic maintaining device in a
communications network and core network equipment that forward data packets between
computer networks.
1. Introduction
Communication Network has shifted from
common physical network devices to the
high level of functional software component
of those network utility. The functional
software package can be run as a software
utility on any hardware cross platform. Such
package are expected to give communication
through these networks in an efficient and
transparent manner. The network efficiency
relies on how the above components are able
to tolerate a faults which could be
malfunction or permanent fault. A system
which provides full functionality for its
application and can recover transparently
from predefined faults is called a fault
tolerant system (Schagaev and Zalewski,
2001).
Fault tolerant network has an ability to find,
detect any unexpected situation that might
resulted in temporarily malfunctions (for
seconds) to permanent faults (several days).
There are two major approach of increasing
reliability with respect to faults in a system;
namely, 1) fault prevention and 2) fault
tolerance. Avoiding all faults from a system
is in most cases impossible or may cause
some problems such as the delay or difficulty
in maintenance. So it bases on fault tolerant
algorithms to manages faults regarding to the
following scenario [5] based on topologies
and protocols used when an actual faults does
occur multiple alarm alerts could be
generated through multiple network
elements. This is said to be fault detection
phases. Modern network management helps
to determine the main cause of faults
regardless of its type which may occur
and perform fault detection, fault isolation
processes and localization simultaneously.
Routers is the most important traffic
maintaining device in a communications
network [2] and core network equipment
that forward data packets between computer
networks in different way. Besides the
fundamental packet routing capability,
modern routers incorporate a variety of
extended functionalities such as traffic
management, packet filtering, and virtual
4. 2
private networks (VPNs). For this reason,
router systems are also becoming extremely
creating exceedingly and complex high
barriers to network innovations. Muhammad
Azam, et al argue that the sustainability of
router hardware and software will determine
the efficiency level of a particular network.
Depending on the character and strength of
the fault, the communication system crash
from a few microseconds to several days. The
manifestation and occurrence of a hardware
fault may cause errors in data transmission in
network communication. The normal way to
recover an erroneous reception of data is to
request retransmission of the particular
packet(s). Although the fault might have
happened inside a router during the process
of path computation, determination and/or
packet encapsulation. In this case, a packet
could be corrupted whilst being read from the
input buffer, written into the processor cache,
processed, or written into output buffers.
Thus the overall reliable operation of a
router becomes critically important for
network efficiency and performance
Basic router functionality and architecture
Generally, the router performs two major
tasks these are: 1) control routines path and
Data control path (switching). Routers
maintain and manipulate routing tables; they
listen for updates and maintain changes in the
routing tables to reflect the new network
topology. The topology network in the core
of the Internet and in organizational networks
is largely dynamic and changes very often.
Routers also divide packets and perform
control actions on the packets; it performs
Layer 3 switching and sometimes maintains
statistical data on the data-flow. Typically,
packets are obtained from inbound network
interface; they are then handled by the
processing module (CPU), possibly put in the
buffering module.
Figure 1 Conventional routerarchitecture [16]
2 Literature Review
In recent years, several approaches have been
investigated to achieve a good fault tolerant
system supports capability in router.
[1] In the improvements of router reliability
using the generalized algorithm of fault
tolerance (GAFT) are presented using time,
structure and information types of
redundancy. But the limitation of the research
Separation of toleration of malfunction and
permanent faults is not well discussed in
terms of their impact on system reliability
[4] Hossam M.A. Fahmy et.al propose a
routing algorithm to handle complex faults in
multicomputer networks with dimension-
order routers. Simple changes to router
structure and routing logic are proposed but
problem shown that its performance in terms
of bisection utilization and message latency
is challenging
[7] Authors improved the single link failure
tolerance, by reconfiguration and defining a
new deterministic routing algorithm for all
routers on a cycle-free around faulty path link
5. 3
The following table show the detail literature
Table 1 Literature review detail comparative analysis
Author’s (year) Techniques/ Parameters Advantages Disadvantages(research
gap)
[1]Azam, N. Ioannides, M.
H. Rümmeli, and I.
Schagaev (2009)
Reducing Router Faults,
network efficiency and
performance.
Improvements to router
reliability
Router functionality
and options to tolerate
faults
Difficult cover all
hardware role by
software
Security of software
router is difficult
[2] K. Xu, W. Chen, C. Lin,
M. Xu, D. Ma, and Y. Qu
(2014)
A reconfigurable routing,
software platform supporting
functional modules and a
component development
environment
practical approach is
introduced to build
an open,
flexible
modularized and
reconfigurable router
system complexity,
most commercial
routers
vendors are a closed
development pattern
[3] A. Runge and Armin
(2015)
Energy consumption ,faster and
smaller router design
NoCs can be used to
tolerate failures
significant energy
consumption
allows a faster and
smaller router design
Buffer less routers
can drop packet in
collision
Every time need to
buffered router
[4] J. Albrecht (2013) Applied to current
implementations in which a
router is partitioned into
multiple modules.
handle complex faults
in multicomputer
networks
high adaptability to
faults
Bisection
utilization and
message latency is
challenging.
[5] H. S. Castro and O. A.
De Lima(2013)
Maintaining
communication between non-
faulty network’s routers.
NoCs have fault
tolerance mechanisms
control mechanism
of backup paths
Backup and control
challenging in some
way
[6] C. Feng, Z. Lu, A.
Jantsch, M. Zhang, and Z.
Xing
Integrated
circuits leads to increases in
susceptibility to transient and
permanent faults.
a fault-tolerant solution
for a buffer less
network-on-chip,
Mechanism to detect
both transient and
permanent faults.
input register for each
input port,
There are no other
buffers in the buffer
less router.
[7] S. Y. Jiang, Y. Liu, J. B.
Luo, H. Cheng, and G. Luo
Improved the single link failure
tolerance and Improvement
ideology for link failures and
router fault tolerance.
not require virtual
channels and
Power consumption
will be reduced.
Focus on single link
or hopes
[8] S. Y. Jiang, Y. Liu, J. B.
Luo, H. Cheng, and G. Luo
Tolerate multiple faulty and
reliability of network without
losing the performance of
network.
Tolerate multiple
faulty& efficient of
network without losing
the performance.
Loss of a
number of packets.
6. 4
3 Research Question
This paper try to investigate and focus the
following question
1) What type’s faults in communication
network?
2) What are the major approach of
increasing reliability with respect to
faults in a system?
3) What is the basic router architecture
and functionality describe it’s
mechanisms for fault detection and
recovery?
4) How generalized algorithm of fault
Tolerance (GAFT) is used as fault
detection and recovery and show fault
tolerance FT routing table works and
show flow chart?
4 Objective
The objective of the study is to improve the
performance of the router through applying
the generalized algorithm of fault tolerance
(GAFT) that bases on time structure,
redundancy of information and a scheme of
reliability improvement for router using
system software recovery points.
5 Methodology
Classification of network typical faults are:
Line outages, a failure of circuit;
White noise, caused by thermal
energy;
Impulse noise, burst errors like
lightning and poor connections
Cross-talk, an adjacent circuit pickup
signal from other circuit;
Attenuation, loosing of capability due
to distance.
Jitter, caused by variation of
frequency modulation and maximum
of amplitude.
Harmonic distortion, wrong amplify of
input signal. Such faults needed to
tolerate using general algorithm for
fault detection and recovery.
Generalized Algorithm of Fault
Tolerance (GAFT) Fault detection,
fault type identification, faulty
component
location, and hardware
reconfiguration, to achieve a
repairable state and re-establishment
of a correct stat
HW (I) - a hardware redundancy to keep extra
information
for GAFT purposes such as redundant line or
1-bit
register of data to check errors of data;
• HW(T) – detect hardware redundancy
such as hardware delay (latch) to avoid
malfunctions caused by racing of signals;
• SW(S) – detect software redundancy
such as periodic hardware testing procedures
performed;
• SW(I) - informational redundancy of the
program deliberately applied to recover a
system.
Recovery Points can be analysis
mathematically
7. 5
Figure 2 GAFT router architecture implementer algorithm
The distributed processing architecture of the
router (central route processing and local
processing subsystem) enables mutual
checking and recovery procedures to be
performed
and excludes the core of the router in terms
of reliability.
Distributed processing archtecture [1]Azam, N.Ioannides, M. H. Rummeli, and I. Schagaev (2009)
act Class M odel
Fault
Is perm anent faul t ?
Giv e loaction to fault
component
Reconfigure Hardw are
component
Rej ect fault componets
Does i t efect Sotware com ponent
Locate faulty program
Define right recov ery
point(RP)
Recov er the system from
RP
continue the operation
Issue1
fi nd
«trace»
Yes
8. 6
Again, in the case of any detected
inconsistencies, a procedure or re-reading the
packet will be applied with n or less
(if successful) number of iterations. Finally,
or the router outbound segment, together with
automatic formation of recovery points
(mentioned as redundant information
generation), a process of checking and
repetition is implemented.
Note that these two processes have a
semantic difference: the checking and
formation of recovery points is synchronous
and is performed constantly along the routing
process. In turn, recovery actions and
repetitions of reading from caches is
asynchronous and activated only when
packet integrity is detected.
6 Conclusion and Future
Works
The approaches for router reliability
clearly stated .Generalized algorithm
of fault tolerance were proposed to
overcome the problem of improving
reliability in case router hardware
components. Router hardware is the
major drawback in improving
9. 7
reliability therefore, it is better to use
software supports to handle such
faults. Flexible real-time fault tolerant
systems apply different steps of the
algorithm making an option to design
The implementation of an algorithm
assumes support and coordination of
the process of hardware checking and
composed of three sets of recovery
points routing, inbound, and
outbound hardware of the router
respectively. The recovery
procedures include searching of the
correct recovery point to restart
operation; probability of this
procedure depends on quality and
consistency of checking procedures.
During the recovery actions might be
implemented in different router
hardware segments; thus reducing
performance degradation of the router
as a whole even in recovery process.
Distributed architecture processing of
the router (central route processing
and local processing subsystem)
enables to perform mutual verifying
and recovery steps excluding
core of the router in terms of
transparency.
10. 8
Reference
[1] M. Azam, N. Ioannides, M. H. Rümmeli, and I. Schagaev, “System Software Support for Router
Fault Tolerance,” Networks, no. July 2015, pp. 13–18, 2009.
[2] K. Xu, W. Chen, C. Lin, M. Xu, D. Ma, and Y. Qu, “Toward a practical reconfigurable router: A
software component development approach,” IEEE Netw., vol. 28, no. 5, pp. 74–80, 2014.
[3] A. Runge and Armin, “Fault-tolerant Network-on-Chip based on Fault-aware Flits and Deflection
Routing,” Proc. 9th Int. Symp. Networks-on-Chip - NOCS ’15, no. January, pp. 1–8, 2015.
[4] J. Albrecht, “B 0 → Μ Μ,” vol. 0, pp. 361–366, 2013.
[5] H. S. Castro and O. A. De Lima, “A fault tolerant NoC architecture based upon external router
backup paths,” 2013 IEEE 11th Int. New Circuits Syst. Conf. NEWCAS 2013, 2013.
[6] C. Feng, Z. Lu, A. Jantsch, M. Zhang, and Z. Xing, “Addressing transient and permanent faults in
NoC with efficient fault-tolerant deflection router,” IEEE Trans. Very Large Scale Integr.
Syst., vol. 21, no. 6, pp. 1053–1066, 2013.
[7] S. Y. Jiang, Y. Liu, J. B. Luo, H. Cheng, and G. Luo, “Study of fault-tolerant routing algorithm of
NoC based on 2D-Mesh topology,” 2013 IEEE Int. Conf. Appl. Supercond. Electromagn.
Devices, ASEMD 2013, no. 41301460, pp. 189–193, 2013.
[8] R. Xie, J. Cai, X. Xin, and B. Yang, “Low-cost adaptive and fault-Tolerant routing method for 2D
network-on-chip,” IEICE Trans. Inf. Syst., vol. E100D, no. 4, pp. 910–913, 2017.
[9] Y. Chawathe and E. A. Brewer, “System Support for Scalable and Fault Tolerant,” Manager,no.
12421, pp. 1–34, 1999.
[10] W. Fu, T. Song, S. Wang, and X. Wang, “for Energy Ef fi cient Router,” pp. 139–140, 2012.
[11] T. Meyer, D. Raumer, F. Wohlfart, B. E. Wolfinger, and G. Carle, “Low latency packet
processing in software routers,” Proc. 2014 Int. Symp. Perform. Eval. Comput. Telecommun.
Syst. SPECTS 2014 - Part SummerSim 2014 Multiconference,pp. 556–563, 2014.
[12] W. Cerroni, C. Raffaelli, and M. Savi, “Optical router architecture to enable next generation
network services,” Int. Conf. Transparent Opt. Networks,pp. 1–4, 2011.
[13] V. A. N. D. E. R. Wal, .“, a , : ~ I : I : I : : : : : : I I I I ! I : I : : I : ~ I : : ......... Pyroxenite Layers,”
vol. 14, no. 7, pp. 839–846, 1992.
[14] Y. Kai, Y. Wang, and B. Liu, “GreenRouter: Reducing power by innovating Router’s
architecture,” IEEE Comput. Archit. Lett., vol. 12, no. 2, pp. 51–54, 2013.
[15] K. Li, X. J. Lu, and J. P. Li, “Fast forwarding system for centralized router,” 2008 Int. Conf.
Apperceiving Comput. Intell. Anal. ICACIA 2008,no. Mc, pp. 315–318, 2008.
, [2], [11]–[15], [3]–[10]