158 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 16, NO. 1, FEBRUARY 2008shown to cover a good percentage of dual-link failures –,these cases often include links that are far away from each other.Considering the fact that these algorithms are not developed fordual-link failures, they may serve as an alternative to recoverfrom independent dual-link failures. However, reliance on suchapproaches may not be preferable when the links close to oneanother in the network share resources, leading to correlated linkfailures. Dual-link failures may be modeled as shared risk link group Fig. 1. Link 1-2 protected by backup path 1-3-4-2 when failed.(SRLG) failures. A connection established in the network maybe given a backup path under every possible SRLG failure. Thisapproach assumes a precise knowledge of failure locations to re-conﬁgure the failed connections on their backup paths. An alter- Link Protection—Failure Dependent Protection (LP-FDP):native is to protect the connections using link protection, where For every second failure that affects the backup path, a backuponly the nodes adjacent to the failed link (and those involved in path under dual-link failure is provided. This backup path isthe backup path of the link) will perform the recovery. The focus computed by eliminating the two failed links from the networkof this paper is to protect end-to-end connections from dual-link and computing shortest path between the speciﬁc node pairs.failures using link protection. Fig. 3 shows the backup path assigned for link 1-2 under dual-link failures. It may be observed that the backup path assignment is different for different failures that affect the path.A. Dual-Link Failure Resiliency With Link Protection When a second link failure occurs, a failure notiﬁcation must be sent to node 1, explicitly mentioning the failure location in the Assume that two links, and , failed one after the other path 1-3-4-2. It is fairly straight forward to see that the average(even if they occur together, assume that one failed ﬁrst fol- backup path length under dual-link failures using LP-FDP willlowed by the other) in a network. The backup path of the ﬁrst be lesser than that using LP-FIP. Every link is assigned onefailed link is analogous to a connection (at the granularity of a backup path for single link failure and multiple backup pathsﬁber) established between two nonadjacent nodes in the network (depending on the number of links in the backup path for thewith link removed. The connection is required to be protected single link failure) under dual-link failures.against a single-link failure. Therefore, strategies developed for Link Protection—Link Protection (LP-LP): Notiﬁcation ofprotecting connections against single link failures may be di- the second failed link to different nodes for them to reconﬁgurerectly applied for dual-link failures that employ link protection their backup paths may result in a high recovery time. In orderto recover from the ﬁrst failure. to avoid notiﬁcation to the other nodes and reconﬁguring at Dual-link failure resiliency strategies are classiﬁed based on the end of the paths, link protection may be adopted to recoverthe nature in which the connections are recovered from ﬁrst and from the second link failure as well. Fig. 4 shows how thesecond failures. The recovery from the ﬁrst link failure is as- backup path 1-3-4-2 is reconﬁgured after the second failure.sumed to employ link protection strategy. Fig. 1 shows an ex- Under this strategy, every link will have only one backup pathample network where link 1-2 is protected by the backup path (for all failure scenarios). In order for this strategy to work, the1-3-4-2. The second protection strategy will refer to the manner backup path under the second failure must not pass throughin which the backup path of the ﬁrst failed link is recovered. the ﬁrst failed link! This constraint is referred to as the backup Link Protection—Failure Independent Protection (LP-FIP): link mutual exclusion (BLME) constraint. Note that the aboveOne approach to dual-link failure resiliency using link protec- approach would fail if the backup path for link 3-4 is 3-1-2-4tion is to compute two link-disjoint backup paths for every link. for the example in Fig. 4.Given a three-edge-connected1 network, there exists three link- Assume we have a network , where denotes thedisjoint paths between any two nodes . Thus, for any two set of nodes and denotes the set of links, and a set of dual-linkadjacent nodes, there exists two link-disjoint backup paths for failures where each element contains at most two linksthe link connecting the two nodes. Let and denote the that can fail and that the failure does not disconnect the network.two link-disjoint backups for link . Without loss of generality, Then, for every link , identify a backup path , such that,assume that, on failure of link , a connection routed along link if link is in and both links and can fail together, then will be rerouted on . If any link in the backup path fails, the backup path of link does not use link .the backup path of will be reconﬁgured to . Hence, the nodes The problem is concisely described as follows. Givenconnected to link must have the knowledge of the failure in its and , identify such that asbackup paths (not necessarily the location). Fig. 2 shows the backup paths assigned by LP-FIP for link 1-2under dual-link failures. The backup path is identical under anysecond failure that affects the path 1-3-4-2. When the secondfailure occurs, a failure notiﬁcation must be sent to nodes 1 and2, although this notiﬁcation need not explicitly mention which B. Background and Prior Worklink failed in path 1-3-4-2. A network must be three-connected for it to be resilient to 1A k -edge-connected is simply referred to as a k -connected graph in the rest any two arbitrary link failures, irrespective of the protectionof this paper. strategy employed. In  and , a heuristic solution to the
RAMASUBRAMANIAN AND CHANDAK: DUAL-LINK FAILURE RESILIENCY THROUGH BLME 159Fig. 2. Dual-link failure resiliency using LP-FIP. Backup path after the second failure remains the same irrespective of the failure.Fig. 3. Dual-link failure resiliency using LP-FDP.Fig. 4. Dual-link failure resiliency using LP-LP.BLME problem for arbitrary dual-link failures is developed. also assumed that the failure of a link results in the failure of allThe heuristic solution, referred to as the Maximum Arbitrary the three ﬁbers on that link.Double-Link Protection Algorithm (MADPA), is shown to ﬁndbackup paths satisfying the BLME constraint for all links in two D. Organizationnetworks out of the three networks considered in their paper; The remainder of this paper is organized as follows. Section IIhowever, it is not guaranteed to ﬁnd a solution even if one exists. develops the necessary theory for the BLME problem and es-In , a polynomial time algorithm is developed for solution tablishes the sufﬁcient condition for the existence of a solution.to the BLME problem considering only adjacent link failures. Section III describes the formulation of the BLME problem asTo the best of our knowledge, there is no prior work that estab- an ILP with some speciﬁc insights into the objective functionlishes the existence of a solution to the BLME problem. and formulation for networks that may not be three-connected. Section IV develops a polynomial time approximation algo-C. Contribution rithm, called Iterative Minimum Cost Path heuristic. Section V This paper develops the necessary theory to prove the suf- discusses the possible loop formation under the BLME ap-ﬁcient conditions for the existence of a solution to the BLME proach and identiﬁes a solution to resolve such loop formations.problem. Solution methodologies to the BLME problem are de- Section VI describes the results obtained by applying the ILPveloped using two approaches by: 1) formulating the BLME formulation and heuristic to six networks. Section VII showsproblem as an integer linear program (ILP) and 2) developing the advantages of designing a four-connected network over aa polynomial time heuristic algorithm. The tradeoffs involved three-connected network given that the location of two failuresin solutions that have and do not have the precise knowledge are known. Section VIII concludes the paper.of the failure location are compared by applying the solutiontechniques to six networks. In addition, the paper also estab- II. THEORYlishes the potential beneﬁts of a four-connected network over a Consider a network represented as a graph , wherethree-connected network when the knowledge of the failure lo- and denote a set of nodes and undirected links, respectively.cation is available. This paper assumes that every link employs The nodes are numbered from 1 through . A link isone primary ﬁber and the failure recovery using link protection assumed to be bi-directional. Let and denote the identiﬁersis performed at the granularity of a ﬁber. Hence, issues relating of the nodes connected by link such that . Letto speciﬁc connections ﬂowing through a link (such as wave- represent the set of directional links, or arcs, in the network. Anlength continuity constraint) are implicitly taken care of. It is arc from node to is denoted as .
160 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 16, NO. 1, FEBRUARY 2008 input combination of , then there exists a path between the nodes and in . Observation 1: The connectivity functions and are in- dependent of each other for any two distinct links , i.e., the functions and do not have any variables in common. Observation 2: If for some , then the network is one-edge connected. The failure of link disconnects the net- work. Conversely, if a network is at least two-edge-connected, then , .Fig. 5. (a) Example network. (b) Network after failure of link 1. The different connectivity functions are related to each other through the BLME constraint. The BLME constraint corresponding to a dual-link failure is written in the The failure of link is assumed to affect the arcs in both direc- sum-of-product and product-of-sum forms, as shown (3) andtions. Let denote the set of dual-link failures to be tolerated. (4), respectively.An element consists of exactly two undirected links or, The BLME problem is then written as a Boolean satisﬁabilitycorrespondingly, four directed arcs. problem, denoted by , as shown in (5). It is observed that is a function of the set of variables . If theA. Sufﬁcient Condition for Existence of a Solution Boolean function is identically 0 for all input combinations, then the BLME problem does not have a feasible solution. If Three-edge-connectivity is a necessary condition for a net- evaluates to a non-zero function, then there exists an inputwork to be resilient to dual-link failures. It is also sufﬁcient that combination for which the function evaluates to 1 (true).a network is three-edge-connected in order to obtain a solution Theorem 1: If a network is at least two-edge-connected andfor BLME problem, proved as follows. , then there exists a dual-link failure that discon- Assume that the given network is divided into auxiliary nects the graph.graphs. An auxiliary graph is constructed by removing link Proof: Given that the network is at least two-edge-con- from the original network: . In each nected, , . Clearly, , . Therefore:auxiliary graph , the goal is to identify a path from node 1) ; to . Let be a binary variable that indicates whether link 2) . is present in the backup path of link : 1 if true, 0 otherwise. Hence, for , the conjunction of a combination of connec- Let be the Boolean function that denotes the connectivity tivity terms with one of the BLME constraints results in anbetween nodes and in the auxiliary graph , represented identically zero-function.as a function of the variable set . A dual-link failure scenario involving links and has the Consider the example network shown in Fig. 5(a). The aux- BLME constraint as shown in (3). If the BLME constraint corre-iliary graph corresponding to link 1 is shown in Fig. 5(b). The sponding to dual-link failure combines with the conjunction ofBoolean function representing the connectivity between nodes the connectivity functions resulting in an identically zero-func-A and B is shown in (1) and (2) at the bottom of the page in tion, then the conjunction of the connectivity terms must takesum-of-product and product-of-sum forms, respectively. the form as shown in (6). Note that the ﬁrst term of (6) cancels It may be observed that is a unate function and, hence, has the BLME constraint involving the links and resulting in aa trivial solution. If the function evaluates to 1 (true) for some zero function for . (1) (2) (3) (4) (5) (6) (7)
RAMASUBRAMANIAN AND CHANDAK: DUAL-LINK FAILURE RESILIENCY THROUGH BLME 161Fig. 6. ILP formulation of backup path selection with BLME constraint. For any two distinct links and , and are independent link under a single-link failure. The average backup path lengthof each other. Hence, (6) implies that and must be of the under single-link failure, denoted by , is computed asform (8) The constraints ensures that a graph must contain a ring with link present in the ring by forcing the correspondingThe above equations imply that, upon failure of link , any path link variables to take a value of 1. The constraint en-from nodes and must traverse link and, on failure of sures that, for two links and that belong to , if link islink , any path from to must traverse link . Links present in , then is not present in graph if the two linksand are mutually dependent on each other for their backup and may be unavailable at the same time. Otherwise, such apaths. Therefore, the dual-link failure involving links and restriction is not imposed.disconnects the network. The constraint RC ensures that every graph has a ring and Corollary: Given a three-edge-connected network, there ex- every node that is present in a graph must have exactly twoists a solution to the BLME problem under any arbitrary two outgoing (or incoming) links. The above constraint introduceslink failures. additional variables to the formulation, however, Proof: The corollary follows from Theorem 1 by consid- they are strongly correlated to the link variables .ering all dual-link combinations in . Assume that the BLME The variables employed in the formulation are limited to takeproblem does not have a solution for a three-edge-connected binary values using Bounds.network. Hence, . By Theorem 1, there exists a dual-linkfailure that disconnects the network. However, no two A. Restricting to at Most One Ring in a Graphlink failures can disconnect the network as the network is three- The ring constraints ensure that there will be at least one ringedge-connected, resulting in a contradiction. Hence, . which includes a particular link in that graph, however, it doesClearly, (identically 1 for all input combinations). Hence, not ensure that there will be exactly one ring in the graph. Thefor a three-edge-connected network, evaluates to a nontrivial formulation in Fig. 6 restricts the number of rings formed in a(non-zero, non-unity) Boolean function and thus must evaluate graph to at most one implicitly through the objective functionto 1 for some input combination. by optimizing on the number of links in the backup path. There are scenarios in which the formation of exactly one ring in a graph may need to be explicitly mentioned. First, for III. INTEGER LINEAR PROGRAM FORMULATION large problems, the ILP may not result in an optimal solution; The BLME problem is formulated as an Integer Linear Pro- however, it may provide intermediate feasible solutions that aregram (ILP) using undirected links. The central idea behind this signiﬁcantly far away from optimal. Even if the objective func-ILP formulation is to view the network as distinct graphs. tion implicitly restricts the formation of only one ring per graph,Each graph, denoted as , will provide a backup path for link . suboptimal solutions may still have multiple rings in a graph.Equivalently, each graph will have a ring traversing through Second, if the goal is to just obtain any feasible solution, thenlink . Let denote the existence of a failure such that one might specify an objective function that is simply a con- and ; 1 if true, 0 otherwise. stant, such as . The objective function always takes Let be a binary variable that indicates whether link is the value ; therefore, any feasible solution is an optimal solu-present in graph : 1 if present, 0 otherwise. Similarly, let tion. Solutions with such objectives may also result in multiplebe a binary variable that indicates whether node is present in rings in a graph.graph or not: set to 1 if present, 0 otherwise. Let denote A graph that has exactly one ring present in it has a path fromwhether node is attached to link or not: 1 if true, 0 otherwise. every node in the graph to every other node in the graph. A graph The formulation of backup path selection for all links satis- that has multiple node-disjoint rings would fail this requirement.fying the BLME constraint is shown in Fig. 6. The objective Alternatively, if we elect at most one master node that is presentfunction is set to minimize the sum of the backup path lengths in the graph, then every node that is present in the graph mustof all links or, equivalently, the average backup path length of a have a path to the master node. Given the nature of formulation,
162 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 16, NO. 1, FEBRUARY 2008Fig. 7. Flow constraints to ensure formation of single ring in a graph, if the objective function does not capture the requirement inherently.it is obvious that a graph must at least have nodes and a link is required to maintain three-connectivity between twopresent in it. Let the master node be . Every node except the node pairs in , then it requires two spare ﬁbers; 2) if a link ismaster node is assumed to source one unit capacity of trafﬁc that required to maintain two-connectivity between two node pairs,sinks at the master node. The constraints shown in Fig. 7 achieve then it requires one spare ﬁber; and 3) if a link is required tothis purpose, where is an arbitrarily large number and maintain (one-)connectivity between two node pairs, it does notis the capacity of ﬂow routed on arc in graph . The con- require any spare ﬁbers. The links removed from the given net-straint SRC 1 speciﬁes that the master node sinks work need not be equipped with spare ﬁbers.ﬂows. SRC 2 speciﬁes that every node except the master must If a given network is three-edge-connected, then the objec-source one unit of ﬂow if they are present, otherwise none. The tive is to keep the graph three-edge-connected with the min-constraint SRC 3 speciﬁes that the paths found for these ﬂows imum number of links, which is a well-known to be NP-Hard.may use only those links that are present in the graph. Every In such a reduced network, every link will employ two sparelink present in the graph is assumed to have an arbitrarily large ﬁbers. When a network is less than three-edge connected, thencapacity . a link may still require two spare ﬁbers even though a dual-link The variable is formulated as real-valued, however, failure involving may disconnect the graph. For example, con-it will be restricted to integers due to the above constraints. As sider the NJ-LATA network shown in Fig. 10(f). There are threeevery graph is assumed to have a counter-rotating ring present, it dual-link failures that disconnect the graph. However, links 1is sufﬁcient to specify the constraint for one direction. Note that and 2 are required to maintain three-connectivity between nodesan explicit representation of this constraint requires ﬂow-based 2 and 3. Hence, two spare ﬁbers are required on links 1 and 2.formulation that requires consideration of “directional links.” On the other hand, the removal of one of the other four links (11, 16, 22, and 23) does not violate the three-connectivity of otherB. Formulation for Networks That are Not Three-Connected nodes. Hence, links 11, 16, 22, and 23 may be equipped with For a network to be resilient to any two link failures, the net- only one spare ﬁber as long as they are not required to main-work must be three-edge-connected. However, some real-life tain the connectivity between the node pairs when other linksnetworks may not be three-edge-connected; hence, certain dual- are removed. Note that if link 20 is removed, then links 22 andlink failures may disconnect the network. For example, in the 23 are required to maintain three-connectivity between nodes 9NJ-LATA network of Fig. 10(f), the network will be discon- and 10, hence will require two ﬁbers on each. The increase innected under three dual-link failures involving link pairs 1 and spare capacity required in other links due to removal of a link2, 11 and 16, and 22 and 23. If a dual-link failure involving two depends on the network topology. It is not necessary that min-links and disconnects the network, it will not be possible to imizing the spare capacity implies minimizing in a graphﬁnd backups for links and satisfying the BLME constraint that is less than three-connected (or vice versa). If a networkas they will have to include each other. The BLME constraint is at least three-connected, then minimizing spare capacity isis relaxed for these failure scenarios by not considering these equivalent of minimizing the number of links to keep the graphdual-link failures. In other words, is assigned 0 (assumed three-connected.not to fail) even though the links can be unavailable at the sametime, if their joint failure will leave the network disconnected. IV. HEURISTIC APPROACH As ILP solution times for large networks may be prohibitivelyC. Minimizing Spare Capacity high, a heuristic approach is also developed. The heuristic solu- If the objective is to minimize the spare capacity allocated tion is based on iterative computation of minimum cost routing.in the network, then links may be removed from the given net- The network is treated as an undirected graph . A set of auxil-work until the transformed network just meets the necessary iary graphs corresponding to failure of a link is created:conditions for the existence of a solution. For a given network . In each auxiliary graph , the objec- , let denote the number of link-disjoint paths be- tive is to obtain a path between the nodes that were originallytween the nodes and . If there exists more than three link-dis- connected by link . Let denote the path selected in auxiliaryjoint paths, then is truncated to three, as three-edge-con- graph . If a link is a part of the path selected on graph ,nectivity is a sufﬁcient condition for the existence of a solution then the path in graph must avoid the use of link . This isto the BLME problem. Let denote the transformed accomplished by imposing a cost on the links in the auxiliarygraph, where , such that the connectivity between all graphs and having the path selection approach select the min-node pairs is maintained in graph . Given , the spare ca- imum cost path. Let denote the cost of link on graphpacity on a link may be computed as follows: 1) if such that it indicates that graph contains link and the two
RAMASUBRAMANIAN AND CHANDAK: DUAL-LINK FAILURE RESILIENCY THROUGH BLME 163Fig. 8. Steps involved in the IMCP heuristic solution.links and may be unavailable simultaneously. Hence, thecost values are binary in nature. The cost of a path in an auxiliary graph is the sum of the costof links in it. At any given instant during the computation, thetotal cost of all the paths is the sum of the cost of the pathsacross all auxiliary graphs. It may be observed that the total costmust be an even number, as every link in a path that has acost of 1 implies that link in path would also have a costof 1. For a given network, the minimum value of the total costwould then be two times the number of dual-link failure sce-narios that would have the network disconnected. If denotesthe number of dual-link failure scenarios that would disconnect Fig. 9. Illustration of loop formation in dual-link failure resiliency with BLME.the graph, then the termination condition for the heuristic is A solid line indicates a single-hop path. A dotted line indicates a single- orgiven by . Fig. 8 shows the steps involved in the Iter- multi-hop path. (a) Looping with a link traversed in opposite directions. (b)ative Minimum Cost Path (IMCP) heuristic. Looping with links 2-3 and 6-7 are traversed twice in the same direction, thus requiring pruning. The complexity of an iteration of the IMCP heuristic isdictated by the (backup) path-selection step (Step 4.2.), whosecomplexity is . The complexity of an iter- Consider a failed link (connecting nodes 1 and 8) whoseation of IMCP (decided by Step 4) is . backup is established along a path where the nodes in the pathThe number of iterations required to obtain a solution depends are numbered from 1 through 8. Fig. 9 shows two kinds of loopon the connectivity of the network. Although there is a termi- formation.nating condition speciﬁed for this algorithm, the algorithm is In Fig. 9(a), link 4-5 is protected by the backup path 4-7-6-5.not guaranteed to terminate for certain networks. Hence, one Upon failure of link 4-5, the backup between nodes 1 and 8 iscan limit the number of iterations to be performed to a ﬁxed modiﬁed as 1-2-3-4-7-6-5-6-7-8, resulting in the loop 7-6-5-6-7.threshold. If the algorithm is restricted to a maximum of While this loop could be pruned using signaling, it is only nec-iterations, the IMCP heuristic is pseudopolynomial with a worst essary to reduce path delay. The backup path for 4-5 will routecase complexity of . both its primary ﬁber and the secondary ﬁber of link 1-8 through Note that the weights of the links in the auxiliary graphs are the path 4-7-6-5, hence two spare ﬁbers will be used in each ofinitialized to zero (Step 2 of the IMCP heuristic). The weights these links. The loop formation involves the same links, howevermay be initialized to a small positive value to obtain backup the links are traversed in opposite direction. As every link needspaths with shorter path lengths. Such a modiﬁcation, however, to be equipped with two spare ﬁbers in each direction (for bi-di-would result in a trade-off between the average backup path rectional connectivity), there will not be a resource contention.length and the number of dual-link failures tolerated. Fig. 9(b) shows another kind of loop formation where there is a potential resource contention. The backup path of link 4-5 is 4-2-3-6-7-5. Upon failure of link 4-5, the backup between nodes V. LOOP FORMATION 1 and 8 is modiﬁed as 1-2-3-4-2-3-6-7-5-6-7-8, resulting in two The backup path of a link after its failure is analogous to loops 2-3-4-2 and 7-5-6-7. Note that links 2-3 and 6-7 are tra-a connection established in a network which is protected using versed in the same direction. If such a loop formation is allowed,link protection (for the second failure). Hence, all of the prop- then links 2-3 and 6-7 must be equipped with three spare ﬁberserties of a link protection strategy for a connection in a regular as the backup path for link 4-5 would be switching two ﬁbers.network is valid in dual-link failure resiliency using link protec- As the network is assumed to have at most two link failures,tion. Loop formation is one among them! it must be sufﬁcient to equip every link with two spare ﬁbers.
164 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 16, NO. 1, FEBRUARY 2008Fig. 10. Networks considered for performance evaluation. (a) ARPANET (20 nodes, 32 links). (b) NSFNET (14 nodes, 23 links). (c) Node-16 (16 nodes, 24 links). 2(d) Node-28 (28 nodes, 42 links). (e) Mesh-4 4 (16 nodes, 32 links). (f) NJ-LATA (11 nodes, 23 links).Therefore, if the links have only two spare ﬁbers, pruning of the Dual-link failure scenarios occur in networks due to two rea-backup paths cannot be avoided. The pruned backup path be- sons, as mentioned in Section I. First, link resources such as con-tween nodes 1 and 8 after link 4-5 failure would be 1-2-3-6-7-8, duit or duct are shared by multiple links for ease of layout. Suchwhile the backup path of link 4-5 would be 4-2-3-6-7-5. sharing of resources is typically limited to links that are close to The looping problem described here is similar to those en- each other, such as adjacent links. Hence, dual-link failure sce-countered in any link protection mechanism. The paths can be narios under such shared resource failure typically affect onlypruned using a signaling mechanism that would be required to nearby links. The second case of dual-link failure scenario isestablish the backup paths. Note that the signaling cannot be due to the time required to repair a failed link. Before a failedcompletely avoided as a link can serve as backup for more than link is repaired, another link in the network could fail; however,two other links, hence protection switches cannot be conﬁgured such failures are typically rare. If it can be assumed that mostprior to failure. of the dual-link failures may be caused because of failures of shared resources, then it is of interest to identify backup path VI. PERFORMANCE EVALUATION assignments by considering only failures of nearby links. The performance of the ILP and heuristic algorithm de- Three kinds of dual-link failures are considered: 1) any arbi-veloped in this paper are evaluated by applying them to six trary two link failures; 2) links that are one node away; and 3)networks as shown in Fig. 10: (a) ARPANET, (b) NSFNET,2 links that are two nodes away. Note that any dual-link failure(c) Node-16, (d) Node-28, (e) Mesh-4 4, and (f) NJ-LATA. All that will disconnect the network is not considered in computingnetworks except NJ-LATA are three-connected. The NJ-LATA the number of failures that can be tolerated.network is not three-connected as nodes 1, 6, and 11 havedegree 2 and is considered “as is” for performance evaluation. A. Performance MetricsThe formulation for the NJ-LATA network has been modiﬁed The performance metrics considered speciﬁcally for the ILPas outlined in Section III-B. The Node-16 and Node-28 net- solutions are: 1) solution time and 2) optimality bound. The op-works are hypothetical networks used to test the limits of the timality bound is relevant in scenarios where the ILP could notILP. All of the nodes in these two networks have exactly three obtain optimal solution, but has a feasible solution with a knownlinks connected to them, thus these two networks are minimally bound on optimality. The ILP is solved using the CPLEX 8.13-connected. solver  on a single-processor Pentium4 2.53 GHz computer 2The NSFNET network considered here has been modiﬁed from the original with 512 MB of RDRAM. The optimality bounds reported innetwork with the addition of link numbered 23 to keep the network three-con- this paper are those provided by the CPLEX solver. The metricnected. that is considered speciﬁcally for heuristic is the number of
RAMASUBRAMANIAN AND CHANDAK: DUAL-LINK FAILURE RESILIENCY THROUGH BLME 165 TABLE I ILP RESULTS FOR TOLERATING ANY TWO-LINK FAILURESdual-link failures that can be tolerated, as the heuristic is not or 2) link is in the backup of at least one link , link isguaranteed to recover from all dual-link failures. also in the backup of at least one link , and links and In addition to the metrics that are speciﬁc to the ILP and can both be unavailable at any given time asheuristic, certain common metrics for both the approaches arealso evaluated: 1) average and maximum backup path lengthfor a single link failure; 2) average and maximum backup path • In all other cases, the link requires one additional ﬁber,length under dual-link failure scenario; and 3) total spare ca- which is a 100% additional capacity.pacity required. The backup path length under a single link failure scenario B. ILP Resultsis simply the average of the length of the backup paths of all Table I shows the results for the six networks to be resilient tothe links. If denotes the length of the backup path of link , any arbitrary two-link failure obtained using ILP with the objec-then the average backup path length under a single link failurescenario, denoted by , is obtained as tive to optimize the average backup path length under single-link failure scenario. The CPLEX program terminated due to insufﬁ- (9) cient memory for Node-28 and Mesh-4 4 networks. While this is indicative of the complexity of the problem, feasible solutions were obtained as intermediate values. The best value obtainedThe maximum backup path length under single-link failure sce- before termination is reported for these networks. It is observednario is obtained as that the solution time increases with increase in the network size (10) but decreases with an increase in average node degree. Note that NSFNET and NJ-LATA networks both have 23 links, however, The backup path of a link under dual-link failure scenario is the solution times are signiﬁcantly different due to their connec-computed as: 1) the backup path length of the link alone if the tivity. For scenarios where an optimal solution is not found, thebackup does not use the second link and 2) it is computed after value of optimality bound indicates the worst case deviation ofpruning if the backup employs the second failed link. Let best value from the optimal. It is to be noted that, although andenote the length of the backup path of link when links and optimal solution may not be obtained, a feasible solution is ob- have failed. The average and maximum backup path length tained for all the networks considered, conﬁrming the existenceunder dual-link failures, denoted by and , respectively, of a solution.are computed as It is observed that a 200% additional capacity (two spare ﬁbers) is required in all of the links of all of the networks except (11) NJ-LATA. Such a requirement can be immediately deduced from the connectivity of the network. For example, whenever (12) a link is necessary3 to keep the network three-connected, then such a link must have two spare ﬁbers. Thus, the networks The additional capacity requirement is computed as discussed Node-16 and Node-28 will require 200% additional capacityin  as follows. even when only adjacent links may fail together. Such a 200% • If a link is not in the backup of any other link, i.e., requirement in capacity may be reduced only on those links , then no additional capacity is required on that whose removal does not affect the three-connectivity prop- link. erty of the network. For example, the link between WA and • A link will require 200% additional capacity under two UT in the NSFNET may be removed without affecting the conditions: 1) link appears in the backup path for two three-connectivity property of the network. However, such a links and that could be unavailable simultaneously, solution would have an increased average backup path length. i.e., For the NJ-LATA network, two of the three link pairs whose 3Removal of the link will result in the network not being three-connected.
166 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 16, NO. 1, FEBRUARY 2008 TABLE II ILP RESULTS FOR TOLERATING ANY TWO-LINK FAILURES WITH THE OBJECTIVE FUNCTION SET TO BE CONSTANT (INDEPENDENT OF THE VARIABLES) TO OBTAIN ANY FEASIBLE SOLUTION TABLE III ILP RESULTS FOR TOLERATING TWO ADJACENT LINK FAILURESfailure disconnects the network were not present in more than Table III shows the results obtained from ILP consideringone backup path, thus a reduction of four ﬁbers was obtained. only adjacent link failures. The number of dual-link failure sce-The high connectivity in the NJ-LATA network results in this narios in such a case decreases, thus leading to a signiﬁcant im-reduction even when the objective function is not set to mini- provement in solution time and average hop length. The CPLEXmize capacity, which is purely coincidental. Such a reduction program terminated due to insufﬁcient memory for Node-28cannot be guaranteed for all networks. It is also observed that network. The best integer solution obtained at the time of ter-the average backup path length under dual-link failures (for mination is reported in the table. The backup path length underNode-16 and Node-28 networks) may be lower than the average a single link failure scenario is considerably reduced when com-backup path length under single-link failure scenarios due to pared to the results for any arbitrary link failures (10.191 topath pruning. 6.786 for Node-28). Typical of any ILP formulation, the solution time increases The backup path length under dual-link failures decreaseswith the network size. In order to obtain any solution that would for adjacent failures compared with arbitrary failures, and sosatisfy the constraints, the ILP is solved with an objective func- does the number of dual-link failures in the network. If the re-tion whose value is a constant, such as . The results duction in the backup path length dominates the reduction infor the six networks are shown for this scenario in Table II. It the number of dual-link failures, then the average backup pathis observed that the solution times are signiﬁcantly reduced as length for adjacent link failures will be lesser than that of arbi-compared with that needed to solve for minimizing the average trary link failures, otherwise it will be higher. The former is thebackup path length. The deviation of the average backup path case for Node-16 and Node-28 networks but latter is the caseunder single-link failures computed with the trivial objective for NSFNET, ARPANET, NJ-LATA networks. Mesh-4 4 re-function from the best/optimal values shown in Table I is ob- mains unaffected.served to be between 23% to 68% for the different networks.These results indicate that attempting to ﬁnd any solution at the C. Heuristic Resultscost of reduced solution time will result in backup paths whose Tables IV–VI show the results obtained using the IMCPaverage length is signiﬁcantly higher than that of the optimal. heuristic approach for tolerating any arbitrary two link failures,It is also observed that the average backup path length under dual-link failure within a distance of two, and two adjacentdual-link failures is lesser than that under single-link failures link failures, respectively. Interestingly, this simple heuristicfor more networks, which is a result of path pruning. obtains a solution that can tolerate most dual-link failures for
RAMASUBRAMANIAN AND CHANDAK: DUAL-LINK FAILURE RESILIENCY THROUGH BLME 167 TABLE IV PERFORMANCE COMPARISON OF IMCP HEURISTIC AND LP-FDP FOR TOLERATING ANY TWO LINK FAILURES TABLE V PERFORMANCE COMPARISON OF IMCP HEURISTIC AND LP-FDP FOR RECOVERING FROM DUAL-LINK FAILURES WITHIN DISTANCE 2 TABLE VI PERFORMANCE COMPARISON OF IMCP HEURISTIC AND LP-FDP FOR TOLERATING TWO ADJACENT LINK FAILURESthe networks considered; the solution cannot recover from four two link failure scenarios. The number of iterations required todual-link failures for Node-28 for any arbitrary two link failures arrive at the solution depends on a lot of parameters, speciﬁcallyand one for any adjacent failures. the order in which the auxiliary graphs are considered and the The heuristic produces a solution in relatively less number of weights employed. Comparing the results of the heuristic to thatiterations for ﬁve of the six scenarios. A maximum of 30 iter- of the ILP, it is observed that the heuristic can be as bad as 60%ations were performed. While the objective of the heuristic is above optimal for average backup path lengths.to obtain a feasible solution, it is not guaranteed to ﬁnd a solu- Comparing with the MADPA heuristic in  and , thetion (as seen in the Node-28 network scenario for any arbitrary IMCP heuristic obtains a solution that recovers from all dual-two link failure scenario). The results presented in this paper are link failures for ARPANET while MADPA tolerates only 490obtained by iterating over a given ordered set of links. In our (out of 496) dual-link failure combinations. Comparison withstudy where we considered iterating the auxiliary graphs in the the average path length obtained by MADPA is not performedreverse order, the heuristic solution could not ﬁnd a feasible so- as the authors of the work indicated that the results presented inlution for Node-16 and Node-28 network scenarios for arbitrary  and  were obtained before pruning.
168 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 16, NO. 1, FEBRUARY 2008D. Comparison With LP-FDP The tables also compare the heuristic results (LP-LP) withthe LP-FDP approach which assigns failure-dependent backuppaths for every link. First, LP-FDP recovers from all dual-linkfailures that do not disconnect the network. It can be observedthat the beneﬁts of knowing the link failures is signiﬁcantly highunder dual-link failure scenarios. The optimal (or best) valuesof the backup path lengths obtained by optimizing the averagebackup path lengths under LP-LP are found to be twice as highas the backup path lengths obtained under LP-FDP. Note thatthis increased backup path length for LP-LP may also result inan increased recovery time from the second failure compare to Fig. 11. A four-connected network where removal of links ` and ` results inLP-FDP. While the signiﬁcant advantages of employing LP-LP a two-connected network.recovery strategy in the network is that every link will have onlyone backup path and the failure need not be explicitly notiﬁed Setting , , , and , there existsto nodes that are not present in the backup path, the increased three link-disjoint paths , , and .backup path length under both single- and dual-failure scenarios By combining the paths , and , a pathmay pose signiﬁcant operational challenges. (through ) is obtained that is link-disjoint with . VII. CAPACITY OPTIMIZATION FOR DUAL-LINK FAILURE Case 2: If is two-connected, there exists a cut-set4 in the RESILIENCY IN HIGHLY CONNECTED NETWORKS graph as shown in Fig. 11. The ILP developed in this paper has an objective of reducing Given that the original network is four-connected, there existsthe average path length while the IMCP heuristic attempts to four link-disjoint paths between any two nodes. Therefore, therejust ﬁnd any feasible solution. Often network designers are in- exists four link-disjoint paths from to . The four link-dis-terested in minimizing network capacity. While the ILP could joint paths must be of the formbe modiﬁed to minimize total capacity, a direct way to reducethe capacity would be to reduce the network to a minimallythree-connected network with every link having one workingand two spare ﬁbers. This capacity calculation is only appli-cable for the LP-LP strategy. The ILP or IMCP heuristic maybe employed on the minimally three-connected graph to obtainbackup paths. The average backup path lengths (under single As the above four paths between and are link-disjoint,and double link failures) will increase as the reduced network the paths , , and must also bewill have lower connectivity. mutually link-disjoint. By combining the paths and It is possible to reduce the network capacity given the precise , a path can be formed that is link-disjointknowledge of the dual-link failure (even under any two arbitrary with the path . Using a similar argument, two link-dis-link failures). For example, assume that every link is assigned joint paths and may be obtained.a backup path for every dual-link failure (similar to LP-FDP). Note that the paths and are obviouslyUnder a dual-link failure involving links and , if the backup link-disjoint with the paths and as they be-paths (for ) and (for ) are link-disjoint, then every link long to two disjoint cut-sets. Using the above link-disjoint paths,on these two backup paths will require only one spare ﬁber for backup paths for links and may be obtained asthis joint failure. If such a requirement is satisﬁed for all dual-link failures, then every link requires only one spare capacity.If a network is minimally three-connected, then every link willrequire two spare ﬁbers. The above two backup paths are link-disjoint as the individual If a network is (even minimally) four-connected, then the path segments are mutually link-disjoint.above requirement can be satisﬁed for all the links in the net- Incorporating such a constraint into LP-FDP will result inwork, as shown by the following theorem. increased backup path lengths and requires further study. The Theorem: Given a four-connected network with capacity savings obtained in the network with the knowledge ofa dual-link failure involving any two arbitrary links (con- failure locations could be signiﬁcant. For example, consider anecting nodes and ) and (connecting and ), there network with nodes (assume to be even) that can be eitherexists backup paths for link and for link such that minimally three-connected or minimally four-connected. For and are link-disjoint. a network to be minimally three (four)-edge-connected, every Proof: After two link failures, the resultant network node must have a degree of at least three (four). The minimum is either: 1) at least three-connected number of links required to form a minimally three-connectedor 2) two-connected. network is . These links must be equipped with one Case 1: If is three-connected, then by a variant of 4The ﬁgure depicts only the nodes and links that are of interest. Some nodesMenger’s theorem  there exists three link-disjoint paths and links that lie within the sets to keep the network four-connected are notfrom a source to three different destinations , , and . shown.
RAMASUBRAMANIAN AND CHANDAK: DUAL-LINK FAILURE RESILIENCY THROUGH BLME 169working ﬁber and two spare ﬁbers to recover from any two link  J. Doucette and W. D. Grover, “Comparison of mesh protection andfailures (irrespective of whether failure locations are known restoration schemes and the dependency on graph connectivity,” in Proc. 3rd Int. Workshop Design of Reliable Communication Networksor not). Thus, a total of capacity is required. On the (DRCN 2001), Budapest, Hungary, Oct. 2001, pp. 121–128.other hand, the minimum number of links required to form a  M. Medard, S. G. Finn, and R. A. Barry, “WDM loop-back recoveryminimally four-connected graph is , where each link will in mesh networks,” in Proc. IEEE INFOCOM, 1999, pp. 752–759.  S. S. Lumetta, M. Medard, and Y. C. Tseng, “Capacity versus ro-be equipped with one working and one spare ﬁber. Thus, a bustness: A tradeoff for link restoration in mesh networks,” J. Lightw.total of capacity is required.5 The MESH-4 4 network Technol., vol. 18, no. 12, pp. 1765–1775, Dec. 2000.(16 nodes and 32 links) considered in this study is a minimally  G. Ellinas, G. Halemariam, and T. Stern, “Protection cycles in WDMfour-connected network that can be reduced to a minimally networks,” IEEE J. Sel. Areas Commun., vol. 8, no. 10, pp. 1924–1937, Oct. 2000.three-connected network by removing eight links. The capacity  W. D. Grover, Mesh-Based Survivable Networks: Options and Strate-provisioned under minimally three-connected and minimally gies for Optical, MPLS, SONET and ATM Networking. Upper Saddlefour-connected scenarios would be 72 (24 working 48 spare) River, NJ: Prentice-Hall, 2003.  M. Fredrick, P. Datta, and A. K. Somani, “Sub-graph routing: A noveland 64 (32 working 32 spare) ﬁber links, respectively. fault-tolerant architecture for shared-risk link group failures in WDM The advantages of a minimally four-connected network optical networks,” in Proc. 4th Int. Workshop Design of Reliable Com-over a minimally three-connected network are: 1) reduction munication Networks (DRCN 2003), Banff, AB, Canada, Oct. 2003,in total capacity by up to 11% (and possibly associated link pp. 296–303.  M. Clouqueur and W. Grover, “Availability analysis of span-restor-costs like line ampliﬁers); 2) higher connectivity (resulting in able mesh networks,” IEEE J. Sel. Areas Commun., vol. 20, no. 4, pp.shorter paths, faster recovery notiﬁcation); and 3) up to 33% 810–821, May 2002.increased primary capacity even with the 11% reduction in  M. Clouqueur and W. D. Grover, “Mesh-restorable networks with complete dual-failure restorability and with selectively enhancedtotal capacity. The drawbacks of the four-connected network dual-failure restorability properties,” in Proc. OPTICOMM, 2002, pp.are: 1) possible increase in network cost due to longer links; 1–12.2) increased switching requirement at nodes (due to increase  J. Doucette and W. D. Grover, “Shared-risk logical san groups in span-in node degree); and 3) increased link failure rate due to in- restorable optical networks: Analysis and capacity planning model,” Photon. Netw. Commun., vol. 9, no. 1, pp. 35–53, Jan. 2005.crease in the number of links by up to 33%. Although precise  J. A. Bondy and U. S. R. Murthy, Graph Theory With Applications.quantiﬁcation of the tradeoffs involved in the design will vary New York: Elsevier, 1976.among networks, it is worth noting that designing a network  H. Choi, S. Subramaniam, and H. Choi, “On double-link failure re- covery in WDM optical networks,” in Proc. IEEE INFOCOM, 2002,with increased network connectivity may not necessarily be an pp. 808–816.expensive proposition.  H. Choi, S. Subramaniam, and H. Choi, “Loopback recovery from double-link failures in optical mesh networks,” IEEE/ACM Trans. Netw., vol. 12, no. 6, pp. 1119–1130, Dec. 2004.  H. Choi, S. Subramaniam, and H.-A. Choi, “Loopback recovery from VIII. CONCLUSION neighboring double-link failures in WDM mesh networks,” Inf. Sci. J., vol. 149, no. 1, pp. 197–209, Jan. 2003. This paper formally classiﬁes the approaches for providing  CPLEX Solver. [Online]. Available: http://www.cplex.comdual-link failure resiliency. Recovery from a dual-link failureusing an extension of link protection for single link failure Srinivasan Ramasubramanian (S’99–M’02)results in a constraint, referred to as BLME constraint, whose received the B.E. (Hons.) degree in electrical and electronics engineering from Birla Institute of Tech-satisﬁability allows the network to recover from dual-link nology and Science (BITS), Pilani, India, in 1997,failures without the need for broadcasting the failure location to and the Ph.D. degree in computer engineering fromall nodes. The paper develops the necessary theory for deriving Iowa State University, Ames, in 2002. Since August 2002, he has been an Assistant Pro-the sufﬁciency condition for a solution to exist, formulates the fessor with the Department of Electrical and Com-problem of ﬁnding backup paths for links satisfying the BLME puter Engineering, University of Arizona, Tucson. Heconstraint as an ILP, and further develops a polynomial time is a codeveloper of the Hierarchical Modeling andheuristic algorithm. The formulation and heuristic are applied Analysis Package (HIMAP), a reliability modeling and analysis tool, which is currently being used at Boeing, Honeywell, andto six different networks and the results are compared. The several other companies and universities. His research interests include archi-heuristic is shown to obtain a solution for most scenarios with a tectures and algorithms for optical and wireless networks, multipath routing,high failure recovery guarantee, although such a solution may fault tolerance, system modeling, and performance analysis. He has served as the TPC Co-Chair of BROADNETS 2005 conference and is an editor of thehave longer average hop lengths compared with the optimal Springer Wireless Networks Journal.values. The paper also establishes the potential beneﬁts ofknowing the precise failure location in a four-connected net-work that has lower installed capacity than a three-connectednetwork for recovering from dual-link failures. Amit Chandak received the B.Tech. degree in elec- trical engineering from the Indian Institute of Tech- nology, Bombay, India, in 2002, and the M.S. degree in electrical and computer engineering from the Uni- REFERENCES versity of Arizona, Tucson, in 2004. From April 2005 to February 2007, he was a  A. Chandak and S. Ramasubramanian, “Dual-link failure resiliency Network Software Engineer with Intel Corporation, through backup link mutual exclusion,” in Proc. IEEE Int. Conf. Broad- where he was involved with the ﬁelds of network band Networks, Boston, MA, Oct. 2005, pp. 258–267. processors and virtualization technology. Since February 2007, he has been a Software Engineer 5Note that this computation does not take into account the lengths of the link. with Cisco Systems, San Jose, CA, where he is partThe cost of increasing the connectivity could be higher if the additional links of the Switch Fabric and Line Card group for CRS-1 platform. His interestsare longer, resulting in lower capacity savings. include networking protocols, embedded systems, and fault tolerance.