A New Self-Routing Multicast Network

386 views
322 views

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
386
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
12
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

A New Self-Routing Multicast Network

  1. 1. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 10, NO. 11, DECEMBER 1999 1299 A New Self-Routing Multicast Network Yuanyuan Yang, Senior Member, IEEE, and Jianchao Wang, Member, IEEE AbstractÐIn this paper, we propose a design for a new self-routing multicast network which can realize arbitrary multicast assignments between its inputs and outputs without any blocking. The network design uses a recursive decomposition approach and is based on the binary radix sorting concept. All functional components of the network are reverse banyan networks. Specifically, the new multicast network is recursively constructed by cascading a binary splitting network and two half-size multicast networks. The binary splitting network, in turn, consists of two recursively constructed reverse banyan networks. The first reverse banyan network serves as a scatter network and the second reverse banyan network serves as a quasisorting network. The advantage of this approach is to provide a way to self-route multicast assignments through the network and a possibility to reuse part of network to reduce the network cost. The new multicast network we design is compared favorably with the previously proposed multicast networks. It uses y…n logP n† logic gates, and has y…logP n† depth and y…logP n† routing time where the unit of time is a gate delay. By reusing part of the network, the feedback implementation of the network can further reduce the network cost to y…n log n†. Index TermsÐMulticast network, self-routing, binary radix sorting network, reverse banyan network, compact routing, recursive construction. æ 1 INTRODUCTION M ULTICAST or one-to-many communication is one of the most important collective communication operations time for any k, I k log n.1 Since the routing algorithm in [4] relies on a cube or a perfect shuffle connected parallel I [1] and is highly demanded in parallel and distributed computer consisting of y…knI‡k † processors, as mentioned applications as well as in other communication environ- in [9], the routing process actually takes y…k logP n† gate ments. For example, multicast is required to make updates delays. Lee and Oruc [9] designed a multicast network with Ë in replicated and distributed databases and in commonly a special built-in routing circuit. Their network uses used parallel algorithms such as matrix multiplication and y…n logP n† logic gates, and has y…logP n† depth and Fast Fourier Transform (FFT). Multiprocessor systems also y…logQ n† routing time where the unit of time is a gate delay. require multicast for barrier synchronization and massage Another notable feature in our multicast network design is to adopt a self-routing scheme. Self-routing is a promising passing, and multicast is a critical operation for video/ routing scheme which usually renders a network with teleconference calls, video-on-demands services and dis- faster switch setting and lower hardware complexity. tance learning in a telecommunication environment. However, most of self-routing network designs described Clearly, providing multicast support at hardware/inter- in the literature are for permutation networks, for example, connection network level is the most efficient way support- [11], [12], [13], [14]. In a recent work, Cheng and Chen [14] ing such communication operations, [2], [3]. A switching designed a new self-routing permutation network con- network that can realize every multicast assignment structed by reverse banyan networks. between its inputs and outputs over edge-disjoint trees is In this paper, we propose a design for a new self-routing referred to as a multicast network. This type of network has multicast network, which is also based on the reverse been investigated by several researchers in the literature [4], banyan networks. We will explore the properties of a [5], [6], [7], [8], [9], [10]. reverse banyan network so that it can be used to handle In this paper, we design a new type of multicast network arbitrary multicast connections in a self-routing manner. using an approach based on recursive decompositions of Different from earlier proposed multicast networks, the multicast networks. This approach was first introduced by new multicast network is conceptually simple and has good Nassimi and Sahni [4] and was later adopted by Lee and modularity. The network design is based on the binary Oruc [9] in their design of a multicast network. The n  n Ë radix sorting concept and all functional components of the I multicast network proposed in [4] uses y…knI‡k log n† P  P network are recursively constructed reverse banyan net- switches and has y…k log n† depth and y…k log n† routing works. Therefore, it has a potential to greatly reduce the network cost by reusing part of the network. The new multicast network we design uses y…n logP n† logic gates, . Y. Yang is with the Department of Electrical and Computer Engineering, and has y…logP n† depth and y…logP n† routing time where State University of New York, at Stony Brook, Stony Brook, NY 11794. E-mail: yang@ece.sunysb.edu. the unit of time is a gate delay. Moreover, by reusing part of . J. Wang is with GTE Laboratories, 40 Sylvan Road, Waltham, MA 02454. the network, the feedback version of our design can reduce E-mail: jwang@gte.com. the network cost to y…n log n†. Manuscript received 3 Sept., 1997. log n For information on obtaining reprints of this article, please send e-mail to: 1. k is a network parameter which equals log m , and (IY m)-generator and n tpds@computer.org, and reference IEEECS Log Number 105737. …nY m†-concentrator are components of the network. 1045-9219/99/$10.00 ß 1999 IEEE
  2. 2. 1300 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 10, NO. 11, DECEMBER 1999 We may also represent this multicast assignment in a binary format: V V W W `& ' ` HII a & 'a HHH IHI 0Y IHH Y fHIHgY 0Y 0Y 0Y X X HHI X Y IIH Y III Clearly, a permutation assignment is a special case of a multicast assignment where each si has, at most, one element. In this paper, we design a multicast network based on binary radix sorting concept and refer to it as a binary radix Fig. 1. The construction of an n  n binary radix sorting multicast sorting multicast network (BRSMN). An n  n BRSMN can be network (BRSMN). recursively constructed by an n  n binary splitting network (BSN) followed by two n  n BRSMNs which are directly P P The rest of the paper is organized as follows: Section 2 linked to the upper half and the lower half of the outputs of introduces the binary radix sorting concept and gives the n  n BSN, respectively. The construction is shown in the recursive definition of the new multicast network Fig. 1. For an input i, depending on its destination set we based on it. Section 3 describes a key component of the have four possible cases: new multicast network, the binary splitting network. Section 4 discusses the basic building block of the binary . Case 1: If all elements in its destination set si are in splitting network, the reverse banyan network. Section 5 the upper half of the network outputs (i.e., the most further explores the properties of the reverse banyan significant bit of every binary address in si is H), network as a scatter network and as a quasisorting net- there will be a single connection from input i via the n  n BSN to an input of the upper n  n BRSMN work. Section 6 presents the distributed self-routing P P with the same destination set si . algorithms, and Section 7 discusses the implementation . Case 2: If all elements in si are in the lower half of the and complexity issues. Section 8 concludes the paper. network outputs (i.e., the most significant bit of Finally, Appendices A, B, and C give some detailed proofs every binary address in si is I), there will be a single and descriptions of the routing algorithms. connection from input i via the BSN to an input of the lower n  n BRSMN with the same destination P P 2 MULTICAST NETWORKS BASED oN BINARY set si . RADIX SORTING . Case 3: If some elements in si go to the upper half and other elements in si go to the lower half (i.e., the We consider an n  n interconnection network with n most significant bits of the addresses in si contain inputs and n outputs where n ˆ Pm . Clearly, each input or both H and I), there will be two connections from output address can be expressed as an m-bit binary number input i via the BSN. One goes to the upper n  n P P —H —I F F F —mÀI . For a multicast connection from network input BRSMN, and another goes to the lower n  n P P i (H i n À I) to a subset of network outputs, let si denote BRSMN. The original destination set si will be split the subset of the outputs that input i is connected to. si is into two subsets which form the destination sets of referred to as the destination set of the multicast connection, the corresponding inputs of the upper and the lower n n or simply the destination set of input i. Then, a multicast P  P BRSMNs, respectively. assignment can be expressed as a set fsH Y sI Y F F F Y snÀI g . Case 4: The destination set is empty (i.e., the input ƒ where, si ’ sj ˆ 0 for i Tˆ j and nÀI si fHY IY F F F Y n À Ig. iˆH carries no message). For example, the following is a multicast assignment of Then, for an n  n BRSMN, we also have four cases for P P an V  V network: ffHY IgY 0Y fQY RY UgY fPgY 0Y 0Y 0Y fSY TggX each input, but this time we need to check the second most Fig. 2. A routing example for a multicast assignment in an V  V BRSMN.
  3. 3. YANG AND WANG: A NEW SELF-ROUTING MULTICAST NETWORK 1301 Fig. 3. Legal operations on the four values in a P  P switch, where xY y P fHY IY Y g. (a)Parallel. (b) Crossing. (c) Upper broadcast. (d) Lower broadcast. significant bit of the binary addresses in the corresponding unicast with no value changed, and the operations in Fig. 3c destination set, and so on. Finally, for a P  P BRSMN (i.e., a and Fig. 3d are broadcast with values and on the inputs P  P switch), realizing a multicast or a unicast connection is changed to H and I on the outputs. straightforward. Fig. 2 shows the routing for the multicast In an n  n BSN using the four value routing tags, let nH , assignment in the example mentioned earlier in an V  V nI , n , and n denote the numbers of inputs with value H, I, binary radix sorting multicast network. From Fig. 2, we can , , respectively. Clearly, they satisfy some constraints. also see that n  n BRSMN is constructed by an n  n BSN First, we have (level 1), followed by P n  n BSN's (level 2), then followed P P by PP PP  PP BSN's (level 3), F F F , and finally followed by n n n nH ‡ nI ‡ n ‡ n ˆ nX …I† P P  P switches (level log n). For more discussions on this Since at most a half of the outputs are in the upper half of type of multicast network structure, readers may also refer the network outputs, we must have to [9]. Clearly, the problem of constructing a binary radix n n n H ‡ n —nd nI ‡ n X …P† sorting multicast network (BRSMN) is now transformed to P P the problem of designing a binary splitting network (BSN) Also, it is interesting to note that described above. n n Y …Q† 3 THE BINARY SPLITTING NETWORK which can be derived from (1) and (2). Now, the function of a BSN is to transform the input tags As described in Section 2, the function of the binary Hs, Is, s, and s to the output side such that all s are splitting network is to split (when necessary) the multicast eliminated, all Hs are in the upper half of outputs, all Is are connection on each input based on whether each output in the destination set of the multicast connection belongs to the in the lower half of outputs. The numbers of the different upper half or the lower half of the network outputs, and tags on the outputs of the BSN should have the following pass the (possibly split) multicast connections properly to ˜ ˜ ˜ ” relations: Let nH , nI , n , and n denote the numbers of the upper or lower n  n BRSMNs following it. To simplify outputs with value H, I, , and , respectively. Since any P P the routing in a BSN, we use a routing tag with four values paired with one will be transformed to a pair of H and I by for each link: H, I, , and , which correspond to Cases 1, 2, switch broadcast operations, we must have 3, and 4 described in the previous section, respectively, and ˜ ˜ ˜ ™ nH ˆ nH ‡ n Y nI ˆ nI ‡ n Y n ˆ n À n Y —nd n ˆ HX …R† are determined by the ith most significant bit of the multicast destination set on the inputs of BSNs at level i. Having described the function of a BSN, we now The operations on four values in a P  P switch are consider the construction of the BSN. An n  n BSN can simply an extension to those on only two values H and I. be constructed by cascading two n  n networks as shown Fig. 3 shows all legal operations on four values in a P  P in Fig. 4a. The first network is referred to as a scatter network switch, where, the operations in Fig. 3a and Fig. 3b are which scatters all s to Hs and Is. In other words, the Fig. 4. Tags scattered in the first subnetwork and then quasisorted in the second subnetwork. (a) The construction of a binary splitting network (BSN). (b) An example of routing in a BSN.
  4. 4. 1302 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 10, NO. 11, DECEMBER 1999 Cheng and Chen's permutation network [14]. An n  n RBN is recursively constructed by two n  n RBNs followed P P by an n  n merging network. An n  n merging network consists of one stage of n P  P switches, such that both the P input and the output links of the stage are connected according to the perfect shuffle interconnection function [15]. The merging network merges the outputs of the upper and the lower n  n RBNs to form the outputs of the n  n P P RBN. The recursive construction of an n  n RBN, and an n  n merging network are shown in Fig. 5. To facilitate the discussions on the functions of an RBN as a scatter network and as a quasisorting network, we need to introduce the following notations. Suppose an n-bit sequence of two symbols, say, and , is to be applied in parallel to the n inputs of a reverse Fig. 5. The recursive definition of an n  n RBN. The dashed box is an banyan network. We define an n-bit circular compact sequence n  n merging network. of two symbols s and s as follows: transformation from the inputs to the outputs of the scatter n ‰sŠ ‰lŠ ‰nÀsÀlŠ if s ‡ l n gsYlYY ˆ …S† network is ‰lÀn‡sŠ ‰nÀlŠ ‰nÀsŠ if s ‡ l b nY n fHY IY Y gà A fHY IY gà X where H s ` n and H l n. The real meaning of gsYlYY is that, in an n-bit sequence all l -bits are compacted together The second network is referred to as a quasisorting followed by also compacted n À l -bits in a circular way network which transfers all Hs to the upper half of the (modulo n), and s is the starting position for the -bit network outputs, and all Is to the lower half of the network sequence. This concept is very important in this paper. In outputs, while each of s may go to either the upper half or fact, the key results of this paper pertain to that under what the lower half. It is easy to see that the network constructed conditions two circular compact sequences can be merged by cascading a scatter network and quasisorting network into a longer circular compact sequence. For example, the n n can perform the functions of a BSN. special circular compact sequence gnYnYHYI ˆ H‰PŠ I‰PŠ is what n P P In Fig. 4b, we show how input tags in a BSN are scattered we expect after sorting tags 0s and 1s in a bit sorting in the first subnetwork and then quasisorted in the second network. In a later section, we will see that circular compact subnetwork. sequences will be used in merging and eliminating tags s In this paper, we use a reverse banyan network to and s in a scatter network. implement both the scatter network and the quasisorting Cheng and Chen [14] considered the circular compact network in a BSN. As can be seen later, this implementation sequence of Hs and Is in their permutation network design, of a BSN not only leads to a faster self-routing multicast and found an interesting property which was used in their network but also provides a possibility to reuse part of the bit-sorting network. Since this property is also very useful network to further reduce the network cost. In the following for the designs of the scatter network and quasisorting sections, we will mainly discuss how a reverse banyan network, we state it below in Theorem 1 in a more general network can perform the functions of a scatter network and form, and provide a different, much easier to understand a quasisorting network. proof for it. As can be seen later, the bit sorting network directly related to Theorem 1 will be used in the quasi-sorting network and the new technique used in the 4 THE REVERSE BANYAN NETWORK proof (i.e., Lemma 1) and some observations will be applied The reverse banyan network (RBN) used in our design is a to the design of the scatter network. variant of reverse banyan network, which was also used in Theorem 1. For any - values on the inputs of an RBN, a circular compact sequence with any starting position can be achieved at the outputs of the RBN under a proper setting for switches in the network. Suppose the inputs of the n  n RBN consist of l s and n À l s, among which the upper half n inputs contain lH s P and the lower half n inputs contain lI s, where lH ‡ lI ˆ l. P Assume Theorem 1 holds for an n  n RBN. We will prove P P that it also holds for an n  n RBN by giving a positive answer to the following question. Question 1. Given integers n, s, l, lH , and lI satisfying that n is Fig. 6. The connections through a switch in a merging network, where an even number, H s ` n, H l n, H lH Y lI n , and P — ˆ ex™h—nge …—†. l ˆ lH ‡ lI , do there exist integers sH and sI , H sH Y sI ` n , P
  5. 5. YANG AND WANG: A NEW SELF-ROUTING MULTICAST NETWORK 1303 n n n such that gsH YlH YY and gsI YlI YY can be merged to gsYlYY P P through an n  n merging network (as defined in this section) under a proper switch setting (in a one-to-one mapping manner)? Before we answer this question, we first review some observations on the shuffle/exchange interconnection func- tions. Consider an n  n merging network. Recall that the inputs (outputs) of the merging network are linked to switches according to the perfect shuffle function. Let — be a binary address of an input of a switch (with address ˜—™) in P the network. Then exchange…—† denoted as — is the other input of the switch. The inputs of the merging network with addresses shuffle…—† and shuffle…† are linked to inputs — and — Fig. 7. Four different switch settings for a P  P switch in an n  n — of the switch, respectively, and the outputs — and — of the merging network. (a) Parallel. (b) Crossing. (c) Upper broadcast. (d) switch are linked to the outputs of the merging network Lower broadcast. with addresses shuffle…—† and shuffle…†, respectively. (See — Fig. 6, and the definition of shuffle/exchange functions in in addition to switch settings ri ˆ H or I, let the switch [15].) We can see that connections are only possible between setting ri ˆ P if the switch is set to upper broadcast, and the inputs and the outputs of a merging network with ri ˆ Q if the switch is set to lower broadcast. We can extend addresses shuffle…—† and shuffle…†. Since only one-to-one — the binary circular compact switch setting to trinary. We connections are considered here, the switch has only two explain the extension by the following example. nA trinary settings: parallel and crossing. The parallel setting circular compact sequence of switch setting ‡sYlI YlP YI YP YQ P corresponds to the connections from input shuffle…—† to means lI consecutive P s followed by lP consecutive Q s and output shuffle…—† and from input shuffle…† to output — then followed by n À lI À lP I s in a circular way, with s P shuffle…† (see Fig. 7a); while the crossing setting corre- — being the starting position of the P s sequence. sponds to the connections from input shuffle…—† to output The following lemma gives a positive answer to shuffle…† and from input shuffle…† to output shuffle…—† (see — — Question 1. Fig. 7b). Note that jshuffle…—† À shuffle…†j ˆ n , that is, — P two inputs n apart in their addresses can be connected to P Lemma 1. Given integers n, s, l, lH , and lI satisfying that n is an outputs with the same addresses in a parallel or crossing even number, H s ` n, H l n, H lH Y lI n , and P way as shown in Fig. 7a and Fig. 7b, respectively. l ˆ lH ‡ lI . Let Let ri denote the setting for switch i. Let ri ˆ H if the n n switch is set to parallel, and ri ˆ I if the switch is set to sH ˆ s mod Y sI ˆ …s ‡ lH † mod P P crossing. Thus, the switch setting of the n switches in the P merging network is an n -bit sequence of Hs and Is. Similar and the switch setting of an n  n merging network be P n to n definition of circular compact sequence in (5), we use the ‡HYs Y˜Y˜ , where P I ‡sYlYHYI to specify the compact switch setting of n switches in P P h ni an n  n merging network, where l consecutive switches ˜ ˆ …s ‡ lH † div mod PY P have setting I with starting position s, and the rest of n n and ˜ ˆ …I À ˜† mod P. Then, gsH YlH YY and gsI YlI YY can be P P switches have setting H in a circular way. In the case of a n switch can also be set to upper broadcast and lower merged to gsYlYY through the n  n merging network under broadcast as shown in Fig. 3c and Fig. 3d, for any switch i, the above switch setting. Fig. 8. The binary tree structure of an RBN. (a) The forward and backward phases in the binary tree. (b) The forward and backward trees embedded in a reverse banyan network.
  6. 6. 1304 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 10, NO. 11, DECEMBER 1999 Proof. See Appendix A. u t the switch have values 1s. By exploring the properties of the circular compact sequence, we can obtain the following Lemma 1 is actually the inductive step of the proof of general results for an n  n RBN with any numbers of s Theorem 1, and it is easy to check that for n ˆ P Theorem 1 and s on its inputs. This result will be used in the proof of holds. Hence, Theorem 1 is proved. Theorem 2. Note that for a full permutation assignment, only two tag Theorem 3. For any values on the inputs of an n  n RBN, let values H and I are used. Let be H, be I, and the initial n , and n be the numbers of inputs with value and , starting position for an n  n RBN be s ˆ n . Clearly the total P respectively. Clearly, n ‡ n n, and the rest of the inputs of number of Is in this case is l ˆ n . By Theorem 1, the circular P n the RBN have values 1s. Let s be any integer such that n compact sequence gsYlYHYI ˆ H‰PŠ I‰PŠ , which represents the n H s ` n. Then, function of bit sorting (in an ascending order), can be achieved by a proper switch setting. 1. n if n n , a circular compact sequence gsYn Àn Y1Y with any starting position s can be achieved at the outputs of the RBN under a proper setting for switches 5 CONSTRUCTING THE BINARY SPLITTING in the network; NETWORK BY THE RBNS 2. n if n ! n , a circular compact sequence gsYn Àn Y1Y Recall that a binary splitting network consists of a scatter with any starting position s can be achieved at the network and a quasisorting network. In this section, we outputs of the RBN under a proper setting for switches show how to use the same basic component network, a in the network. reverse banyan network, to perform the function of a scatter In the above theorem, we say that (or ) is the dominating network and the function of a quasisorting network, type among and if n n (or n ! n ). respectively. We now further explore the property of the circular 5.1 The Reverse Banyan Network as a Scatter compact sequence in order to deal with multicast assign- Network ments. In the case of merging two circular compact In the last section, we considered only two tag values sequences with the same set of binary values, (for example, (0 and 1) for a full permutation assignment in an RBN. merging two sequences with 1s and s, gsH YlH Y1Y and n P Now, for a multicast assignment in an RBN, we must deal n gsI YlI Y1Y ), we can simply use the results in Theorem 1 or P with four tag values H, I, , and . Recall that the function of Lemma 1. However, in other cases, we may also need to a scatter network is to split the s on its input into Hs and Is merge two circular compact sequences with different sets of on its output. binary values (for example, merging a sequence with 1s and The following theorem gives one of the main results of n n this paper, which states that the function of a scatter s, gsH YlH Y1Y , and a sequence with 1s and s, gsI YlI Y1Y ). In this P P network can be achieved under a proper switching setting. case, we need to add two more switch settings: upper The proof of this theorem is given at the end of this broadcast and lower broadcast, and use the notation of the subsection. trinary circular compact switch setting defined in Section 4. Theorem 2. Given an n  n RBN, which is used as the scatter To merge two compact sequences with different sets of network of an n  n BSN, with H, I, and values on the binary values, the following question must be answered. inputs. Let nH , nI , n , and n be the numbers of inputs with value H, I, , and , respectively. Then under a proper setting Question 2. Given integers n, s, l, lH , and lI satisfying that n is for switches in the network, s can be eliminated at the outputs an even number, H s ` n, H l n, H lI lH n , and P of the RBN, i.e. the outputs of the RBN have only values H, I, l ˆ lH À lI , do there exist integers sH and sI (H sH Y sI ` n ) P n n and with n such that gsH YlH Y1Y and gsI YlI Y1Y can be merged to gsYlY1Y P P through an n  n merging network under a proper switch ˜ ˜ ˜ ™ nH ˆ nH ‡ n Y nI ˆ nI ‡ n Y n ˆ n À n Y —nd n ˆ HY setting? ˜ ˜ ˜ ™ where nH , nI , n , and n are the numbers of outputs with value H, I, , and , respectively, and satisfying that H nH Y nI n ˜ ˜ The following lemma gives a positive answer to P ˜ ˜ ˜ and nH ‡ nI ‡ n ˆ n. Question 2. Lemma 2. Given integers n, s, l, lH , and lI satisfying that n is an It is worth pointing out that although n n holds even number, H s ` n, H l n, H lI lH n , and globally for the original n  n RBN which is the scatter P l ˆ lH À lI . Let network of an n  n BSN (see (3)), for the recursively defined subnetwork nH  nH RBN of the n  n RBN, we may n n sH ˆ s mod Y sI ˆ …s ‡ l† mod have nH ! nH , where nH and nH are the numbers of inputs (of P P this nH  nH RBN) with values and , respectively. This is and the switch setting be because that s and s are distributed nonuniformly. n To simplify the problem, we can combine H and I into a 1. ‡sPI YlI YHYP , if s ‡ l ` n ; P single value 1. A link has a value 1 if it has a single value H 2. n ‡sPI YlI YnÀsI ÀlI YIYPYH , if s ` n and s ‡ l ! n ; P P or I. If the two inputs of a P  P switch have values and n P respectively, can be scattered so that the two outputs of 3. ‡sPI YlI YIYP , if s ! n and s ‡ l ` n; P
  7. 7. YANG AND WANG: A NEW SELF-ROUTING MULTICAST NETWORK 1305 n n 4. ‡sPI YlI YnÀsI ÀlI YHYPYI , if s ! n and s ‡ l ! n. P 4. ‡sPH YlH YnÀsH ÀlH YIYQYH , if s ! n and s ‡ l ! n, P P P n n n n n P P n Then, gsH YlH Y1Y and gsI YlI Y1Y can be merged to gsYlY1Y through then, gsH YlH Y1Y and gsI YlI Y1Y can be merged to gsYlY1Y through P P an n  n merging network under the above switch setting. an n  n merging network under the above switch setting. Proof. See Appendix B. u t Now we are in the position to prove Theorem 3 by using Lemmas 1, 2 ,3, 4, and 5. Question 2 has three other symmetric variants, which Proof of Theorem 3. By induction on the network size n. can be obtained by changing the condition in Lemma 2 to lI ! lH and l ˆ lI À lH , and/or swapping for . The For n ˆ P (base case), the network is a P  P switch. There corresponding solutions to these three variants are also are six possible cases: symmetric. Lemmas 3, 4, and 5 give such solutions. . Case 2.1. n ˆ n ˆ H. Let the switch set to By swapping lH for lI in Lemma 2, we can obtain the parallel. Then both outputs of the switch have following lemma. P 1s, that is, sequence gsYHY1Y (or equivalently P Lemma 3. Given integers n, s, l, lH , and lI satisfying that n is an gsYHY1Y ) can be achieved at the outputs of the even number, H s ` n, H l n, H lH lI n , and P switch for H s I. l ˆ lI À lH . Let . Case 2.2. n ˆ n ˆ I. Let the switch set to either upper-broadcast or lower-broadcast (depending n n sH ˆ …s ‡ l† mod Y sI ˆ s mod on which input of the switch has ). Then both P P outputs of the switch have 1s as in Case 2.1. and the switch setting be . Case 2.3. n ˆ I and n ˆ H. Let the switch set to n parallel, if s ˆ H and the upper input has , or 1. ‡ P sH YlH YIYP, if s ‡ l ` n ; P s ˆ I and the lower input has . Otherwise, let n 2. ‡ P sH YlH YnÀsH ÀlH YHYPYI , if s ` n and s ‡ l ! n ; P P P the switch set to crossing. Then sequence gsYIY1Y P 3. ‡ n P , if s ! n and s ‡ l ` n; can be achieved at the outputs of the switch for sH YlH YHYP P n H s I. 4. ‡ P sH YlH YnÀsH ÀlH YIYPYH , if s ! n and s ‡ l ! n, P . Case 2.4. n ˆ H and n ˆ I. Similar to Case 2.3. P n n then, g P sH YlH Y1Y and g P sI YlI Y1Y n can be merged to gsYlY1Y through an . Case 2.5. n ˆ P and n ˆ H. Let the switch set to P parallel. Then sequence gsYPY1Y can be achieved at n  n merging network under the above switch setting. the outputs of the switch for H s I. The two other variants are obtained by simply swapping . Case 2.6. n ˆ H and n ˆ P. Similar to Case 2.5. for , and changing upper broadcast to lower broadcast in Hence, Theorem 3 holds for n ˆ P. Now, assume Lemma 2 and Lemma 3, respectively. Theorem 3 holds for an n  n RBN (inductive hypothesis) P P Lemma 4. Given integers n, s, l, lH , and lI satisfying that n is an and consider an n  n RBN. In the recursive definition of even number, H s ` n, H l n, H lI lH n , and an n  n RBN, let nH and nH denote the numbers of s P l ˆ lH À lI . Let and s, respectively in the upper n  n RBN, and nHH and P P nHH denote the numbers of s and s in the lower n  n P P n n RBN, respectively. Clearly, we have sH ˆ s mod Y sI ˆ …s ‡ l† mod P P nH ‡ nHH ˆ n —nd nH ‡ nHH ˆ n X and the switch setting be n We first assume n n and consider the following 1. ‡sPI YlI YHYQ , if s ‡ l ` n ; n P cases: 2. ‡ P sI YlI YnÀsI ÀlI YIYQYH , if s ` n and s ‡ l ! n ; P P n P n . Case n 1. nH nH and nHH nHH . By the inductive 3. ‡ P sI YlI YIYQ, if s ! P and s ‡ l ` n; P n hypothesis, Theorem 3 holds for both upper and 4. ‡ P n sI YlI Y PÀsI ÀlI YHYQYI , if s ! n and s ‡ l ! n, P lower n  n RBNs. That is, for anyn given integers P P sHn a n d sI ( H sH Y sI ` n ) , gsH YnH ÀnH Y1Y a n d n n P n then, g P sH YlH Y1Y and g P sI YlI Y1Y can be merged to gsYlY1Y through an P gsI YnHH ÀnHH 1Y can be achieved at the outputs of the P n  n merging network under the above switch setting. upper and lower n  n RBNs, respectively. Now, P P Lemma 5. Given integers n, s, l, lH , and lI satisfying that n is an let lH ˆ nH À nH , lI ˆ nHH À nHH , and l ˆ n À n , even number, H s ` n, H l n, H lH lI n , and which implies l ˆ lH ‡ lI . Then by Lemma 1, P given any integer s (H s ` n), there exist l ˆ lI À lH . Let integers sH and sI (H sH Y sI ` n ) such that n n P n n n gsH YlH Y1Y and gsI YlI Y1Y can be merged to gsYlY1Y P P sH ˆ …s ‡ l† mod Y sI ˆ s mod P P through the n  n merging network under a and the switch setting be proper switch setting. Hence, Theorem 3 holds n in this case. 1. ‡sPH YlH YIYQ , if s ‡ l ` n ; P . Case n 2. nH nH and nHH ! nHH . By the P 2. ‡ n P , if s ` n and s ‡ l ! n ; inductive hypothesis, for any given integers sH n sH YlH YnÀsH ÀlH YHYQYI P P n P n a n d sI ( H sH Y sI ` n ) , gsH YnH ÀnH Y1Y a n d n P P 3. ‡ P , if s ! sH YlH YHYQ P and s ‡ l ` n; gsI YnHH ÀnHH Y1Y can be achieved at the outputs of P
  8. 8. 1306 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 10, NO. 11, DECEMBER 1999 the upper and lower n  n RBNs, respectively. P P TABLE 1 Let lH ˆ nH À nH , lI ˆ nHH À nHH , and l ˆ n À n . An Encoding Scheme for Tag Values Then we have l ˆ lH À lI and lH ! lI . By Lemma 4, given any integer s (H s ` n), there exist integers sH and sI (H sH Y sI ` n ) such that n n P n gsH YlH Y1Y and gsI YlI Y1Y can be merged to gsYlY1Y P P through the n  n merging network under a proper switch setting. Thus, the result is also true in this case. In the following, we discuss how an RBN can be used as . Case n 3. nH ! nH and nHH nHH . Similar to Case n 2, a quasisorting network for partial permutation or multicast P P by the inductive hypothesis and Lemma 3, we can assignments. As can be seen in the last subsection, the see Theorem 3 holds in this case. u t outputs of the n  n RBN as a scatter network have tag values H, I, and only, which are passed to the inputs of the Thus, we have proved Theorem 3 for n n . Symme- n  n RBN as a quasisorting network, and the numbers of trically, we can show that the theorem holds for n ! n . such Hs or Is are no more than n . The function of an n  n P In the proof of Theorem 3, we can see that the number of RBN as a quasisorting network is to route all Hs and Is on the tag values of dominating type in the outputs of an RBN the inputs of the RBN to the upper and lower halves of its is the sum of those in its two sub-RBNs, if the two outputs, respectively, and to route s to the remaining sub-RBNs have the same dominating type. In this case, positions at the outputs. We can let some of s be dummy Hs Lemma 1 is applied, and we refer to it as /-addition. On (denoted as H s) and the rest of s be dummy Is (denoted as the other hand, the number of the tag values of dominating I s), such that both the number of all Hs (including all the type is the difference between those of its two sub-RBNs, real Hs and the dummy Hs) and the number of all Is if the two-sub RBNs have different dominating types. In (including all the real Is and the dummy Is) are equal to n . this case, Lemmas 2, 3, 4, and 5 are applied, and we refer P to it as /-elimination. Then by applying the bit sorting results in Theorem 1 we Finally, we are in the position to prove Theorem 2 by can achieve the quasisorting. The distributed algorithm using Theorem 3. dividing s to H s and I s will be given in Section 6. Proof of Theorem 2. Since n n holds for the original n  n RBN which is the scatter network of an n  n BSN 6 THE DISTRIBUTED SELF-ROUTING ALGORITHMS (see (3)), by Theorem 3 we can always eliminate s at the FOR RBNS outputs of the RBN under a proper switch setting (i.e., Lemmas 1, 2, 3, 4, and 5 actually provide a way to perform n gsYn Àn Y1Y can be achieved at the outputs of the RBN, switch setting for all the switches in an RBN as a bit sorting where s can be any number between H and n À I). On the network and as a scatter network, while switch setting for other hand, from the proof of Theorem 3, we know that a an RBN as a quasisorting network can be transformed to tag value H or I, once presented on a link at some stage of that for an RBN as a bit sorting network after all the s on the RBN, will be passed in a unicast way to some output the inputs are properly divided into H s and I s. without any value change. We also know that value In this section, we give a higher level description of the changes among Hs, Is, s, and s occur only in some P  P distributed self-routing algorithms used in switch setting switches. In this case, two inputs of the switch have for each type of RBNs: As a bit sorting network and as a and , respectively, the switch setting is upper or lower scatter network, and describe a distributed algorithm for broadcast (this is always true in our design), and two dividing s to H s and I s in a quasisorting network. outputs of the switch have H and I respectively. Needless to say, these switch setting algorithms can be Consequently, a pair of and are transformed to a implemented using proper logic circuits. In Section 7, we pair of H and I. In total, n such pairs are transformed. will discuss how they can be implemented in a pipelined ˜ ˜ ˜ ™ Hence, nH , nI , n , and n , which are the numbers of fashion so that the circuits used for the switch setting can be outputs with value H, I, , and , respectively, satisfy the distributed to each switch module and a lower hardware following: cost can be achieved. By observing Lemmas 1, 2, 3, 4, and 5, we can see that ˜ ˜ ˜ ™ nH ˆ nH ‡ n Y nI ˆ nI ‡ n Y n ˆ n À n Y —nd n ˆ HX switch setting at the last stage of RBN can be obtained if n, Also by using (1) and (2), we show that H ˜ ˜ nH Y nI n s, lH , and lI are given. Before the switch setting being P ˜ ˜ ˜ and nH ‡ nI ‡ n ˆ n. u t determined, l must be obtained from lH and lI (forward phase), and sH and sI must be obtained from s, l (or lH in 5.2 The Reverse Banyan Network as a Quasisorting Lemma 1) and n (backward phase). We can use the Network recursive property of an RBN to design a distributed The results of Theorem 1 can be used to perform bit sorting routing algorithm for switch setting in the RBN. in an RBN, that is, under a proper switch setting, all Hs and Based on the recursive construction of an RBN, we can Is on the inputs of the RBN can be routed to its outputs in formulate the structure of an RBN into a complete binary an ascending order. However, this works only for full tree shown in Fig. 8a. The root node of the tree represents permutation assignments, in which each input of the RBN the original RBN as a bit sorting network, a scatter network, has a tag value either H or I, and cannot be directly applied or a quasisorting network; the two child nodes of the root to partial permutation or multicast assignments. represent the two recursively defined sub-RBN networks of
  9. 9. YANG AND WANG: A NEW SELF-ROUTING MULTICAST NETWORK 1307 TABLE 2 Comparisons of Recursively Constructed Multicast Networks the original RBN and so on; and, finally, the leaves of the The self-routing algorithm for the RBN as a bit sorting tree represent the inputs of the original RBN. network is shown in Table 3 in Appendix C. It directly The distributed algorithms are performed by each node follows Lemma 1. The self-routing algorithm for the RBN as of the binary tree. The algorithms start from leaves and a scatter network is described in Table 4 in Appendix C. perform forward computations all the way to the root, and This algorithm actually combines the results of all cases in then start from root and perform backward computations Lemmas 1, 2, 3, 4, and 5. The variable type in the all the way to the leaves. For a clearer presentation, we omit algorithm represents the dominating type among s and the initialization step in each of the algorithms. s in the corresponding sub RBN network of size nH  nH . If the two dominating types in the forward inputs agree 6.1 Distributed Switch Setting Algorithms (i.e., /-addition), Lemma 1 applies as in the algorithm The switch setting at a node of the binary tree means the for the RBN as a bit sorting network; otherwise (i.e., compact switch setting for all the switches in the last stage /-elimination) Lemmas 2, 3, 4, and 5 apply. The of the RBN represented by this node. Typically, there are functions BinaryCompactSetting() and TrinaryCompactSet- three phases performed at each node of the tree. In the ting() used in the algorithms correspond to the compact forward phase, whenever the two forward inputs lH and lI switch setting in those lemmas, and are described in are available, l is computed and sent forward to the node in Table 5 in Appendix C. All switches in one stage are set the next stage. In the backward phase, whenever the two simultaneously according to the forward and backward forward inputs lH and lI and the backward input s are values of the nH  nH subnetwork and its own switch- H address in modulo n . available, sH and sI are generated and sent backward to the P nodes in the previous stage. In the switch setting phase, 6.2 Distributed -Dividing Algorithm in a under the same precondition as the backward phase, each Quasisorting Network switch is set simultaneously by using the forward and As stated in the last section, in an n  n RBN quasisorting backward input values and its own address in the stage. network, the tag value on each input belongs only to TABLE 3 The Distributed Self-Routing Algorithm for the RBN as a Bit Sorting Network
  10. 10. 1308 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 10, NO. 11, DECEMBER 1999 TABLE 4 The Distributed Self-Routing Algorithm for the RBN as a Scatter Network fHY IY g, and the number of the inputs with tag H or the n ˆ nH ‡ nHH X …T† number of the inputs with tag I are no more than n . TheP When the forward phase reaches the root node, the function of -dividing algorithm is to reassign H (dummy H) backward phase starts, which will determine that among or I (dummy I) to each input with tag value such that n of s, how many should be dummy Hs (H s) and how both the number of all Hs (including real Hs and dummy Hs) many should be dummy Is (I s). Denote the numbers as nH and the number of all Is (including real Is and dummy Is) and nI , respectively. We must have on the inputs of the network are equal to n . P The distributed -dividing algorithm is given in Table 6 n ˆ nH ‡ nI Y …U† in Appendix C. There are a forward phase and a backward and phase performed at each node of the binary tree in Fig. 8a. The forward phase is to compute n , the number of s in the nH ˆ nHH ‡ nHHH Y …V† inputs of the sub-RBN represented by this node, when nH and nHH , the numbers of s in the inputs of its two sub-RBNs nI ˆ nHI ‡ nHHI Y …W† are available. Clearly, we have
  11. 11. YANG AND WANG: A NEW SELF-ROUTING MULTICAST NETWORK 1309 TABLE 5 The Compact Switching Setting Algorithms for the Last Stage of an nH  nH RBN where nHH and nHI , and nHHH and nHHI are the numbers of an H , and if the nI of the leaf node is I, this input is dummy Hs and dummy Is for two child nodes, respectively. assigned an I . Equations (6), (8), and (9) are invariants for all but leaf nodes and (7) is an invariant for all nodes. To balance the 7 IMPLEMENTATION ISSUES AND COMPLEXITY number of Hs and the number of Is (both ªrealº and ªdummyº ones), initially in the backward phase, nH and nI ANALYSES of the root node can be set to In this section, we first discuss some implementation issues pertaining to the newly designed self-routing multicast n nI ˆ À nI —nd nH ˆ n À nI network, and then analyze the complexity of the network. P where nI represents the number of the real Is of the RBN, 7.1 Routing Tag Format and Handling which can also be computed through the forward phase. To send a multicast message (i.e., multidestination message) Then, nHH and nHI , and nHHH and nHHI of the child node can be through a multicast network, the header of the message computed and passed backward as in the backward must carry the information of the addresses of the algorithm. That is, when forward inputs nH and nHH and destination nodes. This information, as an overhead due backward inputs nH and nI are available, the backward to the nature of this type of communication, is considered as outputs is computed as follows: part of the message data. The preparation of the header is nHH ˆ minfnH Y nH gY nHI ˆ nH À nHH Y done before the message enters the network. For more detailed discussions on this issue and various multiaddress nHHH ˆ nH À nHH Y —nd nHHI ˆ nHH À nHI X encoding schemes, readers can refer to [3]. In this subsec- It can be verified that the invariants shown in (6), (7), (8), tion, we introduce a multiaddress encoding scheme for the and (9) hold at every node. Finally, in the last step of the routing tag of the proposed self-routing multicast network, algorithm, if the nH of a leaf node is I, this input is assigned and briefly describe the routing tag handling technique.
  12. 12. 1310 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 10, NO. 11, DECEMBER 1999 TABLE 6 The Distributed Algorithm Dividing s to H s and I s in the RBN as a Quasi-Sorting Network In order to describe the multicast routing tag format, we through the n  n BSN so that the tags can be used as first need the following preparations. routing tag sequence(s) of the upper or lower or both n  n P P A multicast, represented by a set of addresses of its BSNs. As shown in Fig. 10, if —H equals , it passes destinations in binary, in an n  n BRSMN can be expressed —I Y —Q Y F F F Y —PnÀQ to the upper n  n BSN and —P Y —R Y F F F Y —PnÀP P P as a complete binary tree of height log n (number of levels) to the lower n  n BSN simultaneously; if —H equals H (or I), P P with each node assigned a tag value chosen from fHY IY Y g. it passes only —I Y —Q Y F F F Y —PnÀQ (or —P Y —R Y F F F Y —PnÀP ) to the In this binary tree, level i (I i log n) gives the informa- upper (or lower) n  n BSN. Passing —i s alternatively to the tion of the ith most significant bits of the destinations of the P P upper and the lower subnetworks is for efficiency con- multicast. The binary tree can also be defined recursively as follows: The root node (i.e., the unique node in level 1) of sideration. In fact, this way only a constant number of the tree represents the entire multicast, and its left (or right) buffers are needed to store the tag sequence at each input of child node in level 2, represents a multicast which is formed a BSN as it passes through the network. To ensure the by all the destination addresses of the original multicast message is sent to the destinations of the multicast, the with the most significant bit being 0 (or 1). In general, for a sequences —I Y —Q Y F F F Y —PnÀQ and —P Y —R Y F F F Y —PnÀP must match node in level i …I i log n À I†, its left (or right) child the left and the right subtrees of the original complete node in level i ‡ I represents a multicast which is formed binary tree. This should also hold for all smaller BSNs. by all the destination addresses of the multicast at this node Thus, the routing tag sequence —H Y —I Y —P Y F F F Y —PnÀQ Y —PnÀP in level i with the ith most significant bit being 0 (or 1). The consists of all tags of the complete binary tree arranged in a tag assigned to each node in the binary tree depends on the proper order. In the following, we give a formal definition multicast it represents. For a node in level i …I i log n†, of the routing tag sequence. it is assigned the tag if and only if the destination Let SEQ denote the final routing tag sequence for a addresses of the multicast it represents have both 0 and 1 in multicast. Let the tags of all PiÀI nodes (from left to right) in the ithe most significant bit; it is assigned the tag 0 (or 1) if, and only if, the destination addresses of the multicast it level i …I i log n† be tiI Y tiP Y F F F Y tiYPiÀP Y tiYPiÀP ‡I Y F F F Y tiYPiÀI represents have only 0 (or 1) in the ith most significant bit; it which is denoted as sequence ƒii (Fig. 11 gives an is assigned the tag if and only if it represents an empty example for n ˆ IT). Let conc be a function which con- multicast. Clearly, for a node in level i …I i log n À I†, if catenates a number of sequences to form a single sequence, it has a tag , both its children must have a tag other than ; merge be a function which merges two sequences of equal if it has a tag 0 (or 1), its left (or right) child must have a tag length and is defined as: other than , and its right (or left) child must have a tag ; and if it has a tag , both its children have a tag . It can be merge…˜I Y ˜P Y F F F Y ˜k Y ™I Y ™P Y F F F Y ™k † ˆ ˜I Y ™I Y ˜P Y ™P Y F F F Y ˜k Y ™k shown that, for a given multicast, the binary tree with tags …IH† defined above is unique. Fig. 9a and Fig. 9b give two and order be a recursive function of a sequence of length examples of the binary trees with a tag at each node and k ˆ Pi which is defined as: their corresponding multicasts. Now we are in the position to describe the multicast order…˜I Y F F F Y ˜k † ˆ merge…order…˜I Y F F F Y ˜k †Y routing tag format and its handling technique. First of all, P …II† any network input without a message is always assumed to order…˜k‡I Y F F F Y ˜k ††Y —nd order…˜I † ˆ ˜I X P have a tag . Each message at a network input is assigned a Finally, the routing tag sequence is defined as: routing tag sequence —H Y —I Y —P Y F F F Y —PnÀQ Y —PnÀP , which corresponds to the complete binary tree of the multicast ƒi ˆ ™on™…order…ƒiI †Y order…ƒiP †Y F F F Y on the input. We use —H as the routing tag to route the …IP† order…ƒilog n ††X message on this input in the n  n BSN, i.e., set up switches in the scatter and quasisorting networks as described in In the example shown in Fig. 11 for n ˆ IT, we have the Section 5; then pass the rest of tags along the setup path(s) ordered sequences of four levels,
  13. 13. YANG AND WANG: A NEW SELF-ROUTING MULTICAST NETWORK 1311 Fig. 9. (a) and (b) Examples of the binary trees with tags and their corresponding multicasts. (c) Routing tag sequences for (a) and (b) and their routing. order…ƒiI † ˆ tII Y First of all, as can be seen, the different tag values H, I, , order…ƒiP † ˆ tPI Y tPP Y and (including H and I ) are used in the RBN as a scatter order…ƒiQ † ˆ tQI Y tQQ Y tQP Y tQR Y —nd network and as a quasisorting networks. We need to use order…ƒiR † ˆ tRI Y tRS Y tRQ Y tRU Y tRP Y tRT Y tRR Y tRV X three bits, ˜H ˜I ˜P to represent a tag value. Table 1 lists an encoding scheme for these tag values. Also, when we Thus, the final routing tag format is their concatenation: compute the numbers of s, s, and Is on the inputs, we need a combination of all the bits for every input tag. In this tII Y tPI Y tPP Y tQI Y tQQ Y tQP Y tQR Y tRI Y tRS Y tRQ Y tRU Y tRP Y tRT Y tRR Y tRV X …IQ† encoding, we can use ˜H ” ˜I of an input tag for counting the We omit the proof of (12). Instead, we verify its number of for this input, and use ˜H ” ˜I for counting the correctness by checking the tag sequence (13) in the number of of an input in the forward phases of example and, for simplicity, we check only one branch of self-routing for the RBN as a scatter network and -dividing the binary tree. By using our tag handling method, the algorithm for the RBN as a quasisorting network. tag sequence tPI Y tQI Y tQP Y tRI Y tRQ Y tRP Y tRR , which corresponds Furthermore, since only Hs, Is, and s (including H s and to the left subtree, is sent to the upper V  V BSN; and I s) can be the input tags for the RBN as a quasi-sorting then the tag sequence tQI Y tRI Y tRP , which corresponds to the network, we need to use only bit ˜P of an input tag for first subtree in level Q, is sent to the upper R  R BSN, counting all Is (including real and dummy Is) in the and so on. For the multicast examples in Fig. 9a and forward phase of self-routing for this RBN. Fig. 9b, their routing tag sequences are HH and Second, the binary tree structure used in all distributed IHII, respectively, and Fig. 9c shows the handling for algorithms can be embedded into an RBN. For a balanced these two sequences. hardware distribution, we separate the tree to a forward tree and a backward tree as shown in Fig. 8b, where the first 7.2 Circuit Design Issues and the last switches of the last stage of a sub-RBN network Based on the distributed self-routing algorithms presented can serve as the nodes in the forward tree and the backward in the last section, we can design a self-routing circuit for tree, respectively, and the switches in between can use the our RBN-based multicast network in a similar approach to results from these two nodes for their switch settings. Cheng and Chen's [14] self-routing circuit design for their Third, we can see that the most frequently used RBN-based permutation network. In this subsection, we operation in the distributed algorithms is addition (or only briefly discuss some circuit design issues. addition-like operations). For an n  n RBN, the maximum Fig. 10. Three possible cases of routing in an n  n BSN.
  14. 14. 1312 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 10, NO. 11, DECEMBER 1999 Fig. 12. One bit adder used in a pipelined fashion. Note that the distributed routing algorithms work in a pipelined fashion. It takes y…log n† unit time delay for the first bit (from input) to reach a switch in the last stage of an Fig. 11. A binary tree represents the routing tag sequence of a multicast n  n RBN. Then it takes only y…I† unit time for each of the for n ˆ IT. subsequent log n À I bits to reach a switch in the last stage. Hence, the propagation delay in the forward phase is values of n , n , or nI are n, which can be represented by y…log n† unit time, and so is in the backward phase. Let „ …n† using at most log n bits. We implement the adder in a denote the total propagation delay of the routing algorithm pipelined fashion as shown in Fig. 12. Then, the log n bit for an n  n BRSMN. Since there are a constant number of adder can be reduced to a one bit adder. Also, since the forward and backward phases in switch setting in a BSN, distributed algorithms work in a pipelined fashion, the the routing time for an n  n BSN is y…log n†. From delay caused by running the forward and the backward „ …n† ˆ y…log n† ‡ „ …n†, we obtain „ …n† ˆ y…logP n†. P processes is also reduced. For the feedback implementation discussed in Section 7.3 Feedback Implementation of the Network 7.3, since an n  n BRSMN is now simply an n  n RBN and the routing circuit distributed in each switch still has y…I† Note that in our design all major functional components are cost, we conclude that the feedback version of our design recursively defined reverse banyan networks. We can easily reuse part of the network to reduce network cost. In has y…n log n† network cost. particular, we can construct a feedback version of our In Table 2, we list the cost, depth and routing time of the multicast network, BRSMN as depicted in Fig. 13. Let an newly designed multicast network along with those of the n  n BSN use only one n  n RBN, with each output fed existing multicast networks constructed by the recursive back to the input with the same address. The first pass on decomposition approach. Finally, we can see that the the RBN functions as a scatter network and the second pass RBN-based multicast network presented in this paper functions as a quasisorting network. Note that in the achieves the same order of complexities of network cost, original design of the BRSMN, (see Fig. 2 for an example), network depth, routing time as those of the RBN-based the n  n BSN is followed by two n  n BSNs, and so on. permutation network proposed by Cheng and Chen [14]. P P Now, a BSN is simply an RBN of the same size and an n  n RBN consists of two n  n RBNs followed by an n  n P P 8 CONCLUSIONS merging network (see Fig. 5). In the feedback version of the network, after the first split according to the most In this paper, we have proposed a design for a new significant bit of a destination address, we can reuse the self-routing multicast network using an approach based two n  n RBNs in the n  n RBN as the two BSNs to on recursive decompositions of multicast networks. P P perform the second split according to the second most Different from earlier proposed multicast networks, the significant bit of the destination address, and so on. Consequently, the feedback version of an n  n BRSMN is simply an n  n RBN. 7.4 Complexity Analyses In this subsection, we analyze the hardware cost, network depth and routing time of the new multicast network. Compared to Chen and Cheng's permutation network [14], since we add only a constant cost (i.e., a constant number of one bit adders or adder-like circuits) to each switch for the self-routing circuit and there are n log n P switches in an n  n RBN, the hardware cost of the RBN is y…n log n†, and so is that of an n  n BSN. Let g…n† denote the cost of an n  n BRSMN. By the recursive À Á construction in Fig. 1, we have g…n† ˆ y…n log n† ‡ Pg n Y P which implies g…n† ˆ y…n logP n†. Since the network depth of an n  n RBN is y…log n†, so is an n  n BSN. Let h…n† denote the network Á À depth for an n  n BRSMN. From h…n† ˆ y…log n† ‡ h n Y weP immediately have h…n† ˆ y…logP n†. Fig. 13. The feedback implementation of the network.

×